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INTRODUCTION 


1.1 UNCERTAINTY IN ROBOTICS 


Robotics is the science of perceiving and manipulating the physical world through 
computer-controlled mechanical devices. Examples of successful robotic systems in- 
clude mobile platforms for planetary exploration [], robotics arms in assembly lines [], 
cars that travel autonomously on highways [], actuated arms that assist surgeons []. 
Robotics systems have in common that they are are situated in the physical world, per- 
ceive their environments through sensors, and manipulate their environment through 
things that move. 


While much of robotics is still in its infancy, the idea of “intelligent” manipulating 
devices has an enormous potential to change society. Wouldn’t it be great if all our 
cars were able to safely steer themselves, making car accidents a notion of the past? 
Wouldn’t it be great if robots, and not people, would clean up nuclear disasters sites 
like Chernobyl? Wouldn’t it be great if our homes were populated by intelligent ser- 
vice robots that would carry out such tedious tasks as loading the dishwasher, and 
vacuuming the carpet, or walking our dogs? And lastly, a better understanding of 
robotics will ultimately lead to a better understanding of animals and people. 


Tomorrows application domains differ from yesterdays, such as manipulators in as- 
sembly lines that carry out the identical task day-in day-out. The most striking char- 
acteristic of the new robot systems is that they operate in increasingly unstructured 
environments, environments that are inherently unpredictable. An assembly line is or- 
ders of magnitude more predictable and controllable than a private home. As a result, 
robotics is moving into areas where sensor input becomes increasingly important, and 
where robot software has to be robust enough to cope with a range of situations—often 
too many to anticipate them all. Robotics, thus, is increasingly becoming a software 
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science, where the goal is to develop robust software that enables robots to withstand 
the numerous challenges arising in unstructured and dynamic environments. 


This book focuses on a key element of robotics: Uncertainty. Uncertainty arises if the 
robot lacks critical information for carrying out its task. It arises from five different 
factors: 


]. Environments. Physical worlds are inherently unpredictable. While the degree 
of uncertainty in well-structured environments such assembly lines is small, en- 
vironments such as highways and private homes are highly dynamic and unpre- 
dictable. 


2. Sensors. Sensors are inherently limited in what they can perceive. Limitations 
arise from two primary factors. First, range and resolution of a sensor is sub- 
ject to physical laws. For example, Cameras can't see through walls, and even 
within the perceptual range the spatial resolution of camera images is limited. 
Second, sensors are subject to noise, which perturbs sensor measurements in un- 
predictable ways and hence limits the information that can be extracted from 
sensor measurements. 


3. Robots. Robot actuation involves motors that are, at least to some extent, unpre- 
dictable, due effects like control noise and wear-and-tear. Some actuators, such 
as heavy-duty industrial robot arms, are quite accurate. Others, like low-cost 
mobile robots, can be extremely inaccurate. 


4. Models. Models are inherently inaccurate. Models are abstractions of the real 
world. As such, they only partially model the underlying physical processes of 
the robot and its environment. Model errors are a source of uncertainty that has 
largely been ignored in robotics, despite the fact that most robotic models used 
in state-or-the-art robotics systems are rather crude. 


5. Computation. Robots are real-time systems, which limits the amount of com- 
putation that can be carried out. Many state-of-the-art algorithms (such as most 
of the algorithms described in this book) are approximate, achieving timely re- 
sponse through sacrificing accuracy. 


АП of these factors give rise to uncertainty. Traditionally, such uncertainty has mostly 
been ignored in robotics. However, as robots are moving away from factory floors into 
increasingly unstructured environments, the ability to cope with uncertainty is critical 
for building successful robots. 
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1.2 PROBABILISTIC ROBOTICS 


This book provides a comprehensive overview of probabilistic algorithms for robotics. 
Probabilistic robotics is a new approach to robotics that pays tribute to the uncertainty 
in robot perception and action. They key idea of probabilistic robotics is to represent 
uncertainty explicitly, using the calculus of probability theory. Put differently, instead 
of relying on a single “best guess” as to what might be the case in the world, prob- 
abilistic algorithms represent information by probability distributions over a whole 
space of possible hypotheses. By doing so, they can represent ambiguity and degree 
of belief in a mathematically sound way, enabling them to accommodate all sources 
of uncertainty listed above. Moreover, by basing control decisions on probabilistic 
information, these algorithms degrade nicely in the face of the various sources of un- 
certainty described above, leading to new solutions to hard robotics problems. 


Let us illustrate the probabilistic approach with a motivating example: mobile robot 
localization. Localization is the problem of estimating a robot’s coordinates in an ex- 
ternal reference frame from sensor data, using a map of the environment. Figure 1.1 
illustrates the probabilistic approach to mobile robot localization. The specific local- 
ization problem studied here is known as global localization, where a robot is placed 
somewhere in the environment and has to localize itself from scratch. In the proba- 
bilistic paradigm, the robot’s momentary estimate (also called belief) is represented 
by a probability density function over the space of all locations. This is illustrated 
in the first diagram in Figure 1.1, which shows a uniform distribution (the prior) that 
corresponds to maximum uncertainty. Suppose the robot takes a first sensor mea- 
surement and observes that it is next to a door. The resulting belief, shown in the 
second diagram in Figure 1.1, places high probability at places next to doors and low 
probability elsewhere. Notice that this distribution possesses three peaks, each corre- 
sponding to one of the (indistinguishable) doors in the environment. Furthermore, the 
resulting distribution assigns high probability to three distinct locations, illustrating 
that the probabilistic framework can handle multiple, conflicting hypotheses that natu- 
rally arise in ambiguous situations. Finally, even non-door locations possess non-zero 
probability. This is accounted by the uncertainty inherent in sensing: With a small, 
non-zero probability, the robot might err and actually not be next to a door. Now 
suppose the robot moves. The third diagram in Figure 1.1 shows the effect of robot 
motion on its belief, assuming that the robot moved as indicated. The belief is shifted 
in the direction of motion. It is also smoothed, to account for the inherent uncertainty 
in robot motion. Finally, the fourth and last diagram in Figure 1.1 depicts the belief 
after observing another door. This observation leads our algorithm to place most of 
the probability mass on a location near one of the doors, and the robot is now quite 
confident as to where it is. 
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Figure 1.1 The basic idea of Markov localization: A mobile robot during global localiza- 
tion. 
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This example illustrates the probabilistic paradigm in the context of a specific per- 
ceptual problem. Stated probabilistically, the robot perception problem is a state es- 
timation problem, and our localization example uses an algorithm known as Bayes 
filter for posterior estimation over the space of robot locations. Similarly, when select- 
ing actions, probabilistic approaches considers the full uncertainty, not just the most 
likely guess. By doing so, the probabilistic approach trades off information gathering 
(exploration) and exploitation, and act optimally relative to the state of knowledge. 


1.3 IMPLICATIONS 


What are the advantages of programming robots probabilistically, when compared to 
other approaches that do not represent uncertainty explicitly? Our central conjecture 
is nothing less than the following: 


A robot that carries a notion of its own uncertainty and that acts accordingly 
is superior to one that does not. 


In particular, probabilistic approaches are typically more robust in the face of sensor 
limitations, sensor noise, environment dynamics, and so on. They often scale much 
better to complex and unstructured environments, where the ability to handle uncer- 
tainty is of even greater importance. In fact, certain probabilistic algorithms are cur- 
rently the only known working solutions to hard robotic estimation problems, such as 
the kidnapped robot problem, in which a mobile robot must recover from localization 
failure; or the problem of building accurate maps of very large environments, in the 
absence of a global positioning device such as GPS. Additionally, probabilistic algo- 
rithms make much weaker requirements on the accuracy of models than many classical 
planning algorithms do, thereby relieving the programmer from the unsurmountable 
burden to come up with accurate models. Viewed probabilistically, the robot learning 
problem is a long-term estimation problem. Thus, probabilistic algorithms provide a 
sound methodology for many flavors of robot learning. And finally, probabilistic al- 
gorithms are broadly applicable to virtually every problem involving perception and 
action in the real world. 


However, these advantages come at a price. Traditionally, the two most frequently 
cited limitations of probabilistic algorithms are computational inefficiency, and a 
need to approximate. Probabilistic algorithms are inherently less efficient than non- 
probabilistic ones, due to the fact that they consider entire probability densities. The 
need to approximate arises from the fact that most robot worlds are continuous. Com- 
puting exact posterior distributions is typically infeasible, since distributions over the 
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continuum possess infinitely many dimensions. Sometimes, one is fortunate in that 
the uncertainty can approximated tightly with a compact parametric model (e.g., dis- 
crete distributions or Gaussians); in other cases, such approximations are too crude 
and more complicated representations most be employed. Recent research has suc- 
cessfully led to a range of are computationally efficient probabilistic algorithms, for a 
range of hard robotics problems—many of which are described in depth in this book. 
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This book attempts to provide a comprehensive and in-depth introduction into proba- 
bilistic robotics. The choice of material is somewhat biased towards research carried 
out at Carnegie Mellon University, the University of Bonn, and affiliated labs. How- 
ever, we have attempted to include in-depth descriptions of other, important proba- 
bilistic algorithms. The algorithms described here have been developed for mobile 
robots; however, many of them are equally applicable to other types of robots. Thus, 
the coverage of the material is by no means complete; probabilistic ideas have recently 
become extremely popular in robotics, and a complete description of the field would 
simply not fit into a single book. However, we believe that the choice of material is 
representative for the existing body of literature. 


The goal of the book is to provide a systematic introduction into the probabilistic 
paradigm, from the underlying mathematical framework to implementation. For each 
major algorithm, this book provides 


=  acomplete mathematical derivation, 
m pseudo-code in a C-like language, 
m discussions of implementation details and potential pitfalls, and 


m empirical results obtained in fielded systems. 


We believe that all four items are essential for obtaining a deep understanding of the 
probabilistic paradigm. At the end of each chapter, the book also provides biblio- 
graphical notes and a list of questions and exercises. 


The book has been written with researchers, graduate students or advanced undergrad- 
uate students in mind, specializing in robotics or applied statistics. We have attempted 
to present the material in a way that requires a minimum of background knowledge. 
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However, basic knowledge of probability theory will almost certainly help in under- 
standing the material. The various mathematical derivations can easily be skipped at 
first reading. However, we strongly recommend to take the time and study the mathe- 
matical derivations, as a profound mathematical understanding is will almost certainly 
lead to deep and important insights into the working of the probabilistic approach to 
robotics. 


If used in the classroom, each chapter should be covered in one or two lectures; how- 
ever, we recommend that the study of the book be accompanied by practical, hands-on 
experimentation as directed by the questions and exercises at the end of each chapter. 


This book is organized in four major parts. 


m The first part, Chapters 2 through 5, discuss the basic mathematical framework 
that underlies all of the algorithms described in this book. Chapters 2 through 4 
introduce the basic probabilistic notation and describes a collection of filters for 
probabilistic state estimation. Chapter 5 discusses specific probabilistic models 
that characterize mobile robot perception and motion. 


m Тһе second part, which comprised Chapters 7 to ??, describes a range of per- 
ceptual algorithms, which map sensor measurements into internal robot beliefs. 
In particular, Chapter 7 describes algorihtms for mobile robot localization, fol- 
lowed by algorithms for map acquisition described in Chapters ??. This part also 
contains a chapter on learning models. 


m The third part, in Chapters ?? to ??, introduces probabilistic planning and action 
selection algorihtms. 


= Finally, Chapter ?? describes two robot systems that were controlled by proba- 
bilistic algorithms. These robots were deployed in museums as interactive tour- 
guide robots, where they managed to navigate reliably without the need to modify 
the museums in any way. 


The book is best read in order, from the beginning to the end. However, we have 
attempted to make each individual chapter self-explanatory. 


1.5 BIBLIOGRAPHICAL REMARKS 


The term ‘tobot” was invented in 1921 by the Czech novelist Karel Capek [42], to describe a willing, 
intelligent and human-like machines that make life pleasant by doing the type work we don't like to do. In 
the Fourties, Asimov coined the term 'tobotics" and postulated the famous three laws of robotics [1, 2], 


CHAPTER 1 


Robotics, as a scientifi c discipline has been an active fi eld of research foer several decades. In the early 
years, most of the research focused on 


Major trends in robotics: 


Seventies: classical (deliberate) approach, accurate models, no uncertainty, no sensing. Still: very 
hard problem. Reif: planning problem NP hard, but only doubly exponential algorithms known. 
Canny: fi rst single exponential planning algorithm. Latombe: Many impressive randomized planning 
algorithms. Important difference: randomization used for search, not for representing uncertainty. 
Planning algorithm for special topologies: Schwartz, and Sharir. Some take sensor data into account. 
Koditschek navigation function for feedback control, generalizes Khatib's potential fi elds, which suf- 
fer from local minima (navigation functions don't). 


Mid-Eighties: reactive approach, rejection of models, pure reliance on sensors, uncertainty (if any) 
handled by feedback control. Relies on sensor data carrying suffi cient information for action selec- 
tion, therefore uncertainty not an issue. Typically confi ned to small environments, where everything 
of importance can be perceived by the sensors. Nevertheless, some impressive results for control of 
legged robots. Brooks, Mataric, Steels, Connell, and many others. Biologically inspired robotics. 
Smithers. 


Mid-Nineties: Hybrid approaches, reactive at low level (fast decision cycle), deliberate at higher 
levels. Uses sensing mostly at low level, and models at high level. Gat, Arkins, Balch, Firby, Simmons 
and many others. harvests best of both worlds: robust through reaction, but can do more powerful 
tasks due to deliberation. 


Late Ninties: Probabilistic robotics, different way to integrate models and sensor data. 


Probabilistic robotics can be traced back to Sixties to advent of Kalman fi ters, which has been used exten- 
sively in robotics. First serious advance is Smith and Cheeseman in mid-80s, who proppose an algorithm for 
concurrent mapping and localization based on Kalman fi Iters that is now in widespread use. Durrant- Whyte, 
Leonard, Castellanos, and many others. 


Main activity in past fi ve years. Main advances: more fbxible representations (beyond Kalman fi Iters), 
more effi cient algorithms. Statistical approaches for solving hard correspondence problems. 


What is new in probabilistic robotics, relative to approaches above? 


seamlessly blends models and perception in a novel way 
sound mathematical theory, clear assumptions, therefore it's easier to predict failure modes 
applies to all levels: lowest to highest, since uncertainty arises everywhere 


currently the best known solutions in a range of hard robotics problems 


RECURSIVE STATE ESTIMATION 


2.1 INTRODUCTION 


At the core of probabilistic robotics is the idea of estimating state from sensor data. 
State estimation addresses the problem of estimating quantities from sensor data that 
are not directly observable, but that can be inferred. In most robotic applications, 
determining what to do is relatively easy if one only knew certain quantities. For 
example, moving a mobile robot is relatively easy if the exact location of the robot 
and all nearby obstacles are known. Unfortunately, these variables are not directly 
measurable. Instead, a robot has to rely on its sensors to gather this information. 
Sensors carry only partial information about those quantities, and their measurements 
are corrupted by noise. State estimation seeks to recover state variables from the data. 
Probabilistic state estimation algorithms compute belief distributions over possible 
world states. An example of probabilistic state estimation was already encountered in 
the introduction to this book: mobile robot localization. 


The goal of this chapter is to introduce the basic vocabulary and mathematical tools 
for estimating state from sensor data. 


m Section 2.2 introduces basic probabilistic concepts and notations used throughout 
the book. 


m Section 2.3 describes our formal model of robot environment interaction, setting 
forth some of the key terminology used throughout the book. 


m Section 2.4 introduces Bayes filters, the recursive algorithm for state estimation 
that forms the basis of virtually every technique presented in this book. 
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m Section 2.5 discusses representational and computational issues that arise when 
implementing Bayes filters. 


2.2 BASIC CONCEPTS IN PROBABILITY 


This section familiarizes the reader with the basic notation and probabilistic facts and 
notation used throughout the book. In probabilistic robotics, quantities such as sensor 
measurements, controls, and the states a robot and its environment might assume are 
all modeled as random variables. Random variables can take on multiple values, and 
they do so according to specific probabilistic laws. Probabilistic inference is the pro- 
cess of calculating these laws for random variables that are derived from other random 
variables, such as those modeling sensor data. 


Let X denote a random variable and x denote a specific event that X might take on. 
A standard example of a random variable is that of a coin flip, where X can take on 
the values head or tail. If the space of all values that .X can take on is discrete, as is 
the case if X is the outcome of a coin flip, we write 


p(X — x) Q.1) 


to denote the probability that the random variable X has value x. For example, a fair 
coin is characterized by p(X = head) = p(X = tail) = +. Discrete probabilities 
sum to one, that is, 


Sp ea = 1, (2.2) 


and of course, probabilities are always non-negative, that is, p(X = x) > 0. To 
simplify the notation, we will usually omit explicit mention of the random variable 
whenever possible, and instead use the common abbreviation p(x) instead of writing 
p(X = x). 


Most techniques in this book address estimation and decision making in continuous 
spaces. Continuous spaces are characterized by random variables that can take on a 
continuum of values. Throughout this book, we assume that all continuous random 
variables possess probability density functions (PDFs). A common density function 
is that of the one-dimensional normal distribution with mean p and variance o?. This 
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distribution is given by the following Gaussian function: 


р(х) = (2192)? oo [4827] (2.3) 


Normal distributions play a major role in this book. We will frequently abbreviate 
them as N (z; и, а?), which specifies the random variable, its mean, and its variance. 


The Normal distribution (2.3) assumes that т is a scalar value. Often, x will be a 
multi-dimensional vector. Normal distributions over vectors are called multivariate. 
Multivariate normal distributions are characterized by density functions of the follow- 
ing form: 


p(r) = det (2л) 2 exp {—5(х— i) sectores p) (2.4) 


Here и is the mean vector and X a (positive semidefinite) symmetric matrix called 
covariance matrix. The superscript ^ marks the transpose of a vector. The reader 
should take a moment to realize that Equation (2.4) is a strict generalization of Equa- 
tion (2.3); both definitions are equivalent if x is a scalar value. The PDFs of a one- 
and a two-dimensional normal distribution are graphically depicted in Figure 5.6. 


Equations (2.3) and (2.4) are examples of PDFs. Just as discrete probability distribu- 
tions always sums up to one, a PDF always integrates to 1: 


n da md Q.5) 


However, unlike a discrete probability, the value of a PDF is not bounded above by 1. 
Throughout this book, we will use the terms probability, probability density and prob- 
ability density function interchangeably. We will silently assume that all continuous 
random variables are measurable, and we also assume that all continuous distributions 
actually possess densities. 


The joint distribution of two random variables X and Y is given by 


p(r,y) = р(Х=хжаду =y). Q.6) 
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This expression describes the probability of the event that the random variable X takes 
on the value x and that Y takes on the value y. If X and Y are independent, we have 


p(z,y) = p(x) ply). (2.7) 


Often, random variables carry information about other random variables. Suppose we 
already know that Y's value is y, and we would like to know the probability that X's 
value is x conditioned on that fact. Such a probability will be denoted 


p(x | v) P(X =2|Y —y) (2.8) 


and is called conditional probability. If p(y) > 0, then the conditional probability is 
defined as 


p(r|y) = | (2.9) 


pr|y = ЕТЕ р(х). (2.10) 


In other words, if X and Y are independent, Y tells us nothing about the value of X. 
There is no advantage of knowing Y if our interest pertains to knowing X. Indepen- 
dence, and its generalization known as conditional independence, plays a major role 
throughout this book. 


An interesting fact, which follows from the definition of conditional probability and 
the axioms of probability measures, is often referred to as theorem of total probability: 


© 

S 

= 
Il 


X ple | y) p(y) (discrete case) (2.11) 
y 


© 

P us 

D 
II 


fre | y) p(y) dy (continuous case) (2.12) 


If p(x | y) or p(y) are zero, we define the product p(x | y) p(y) to be zero, regardless 
of the value of the remaining factor. 
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Equally important is Bayes rule, which relates conditionals of the type p(x | y) to 
their “inverse,” p(y | 2). The rule, as stated here, requires p(y) > 0: 


ply | х) р(х) _ р(у|хт)р(т) iscrete 
pz |y) K Жыт. “ee ee 
= ply | x) p(x) - ply | 2) рі) continuous 
е Табу |’) pa?) day (2019900) (214 


Bayes rule plays a predominant role in probabilistic robotics. If x is a quantity that we 
would like to infer from y, the probability p(x) will be referred to as prior probability 
distribution, and y is called the data (e.g., a sensor measurement). The distribution 
p(x) summarizes the knowledge we have regarding X prior to incorporating the data 
y. The probability p(x | y) is called the posterior probability distribution over X. As 
(2.14) suggests, Bayes rule provides a convenient way to compute a posterior p(z | y) 
using the "inverse" conditional probability p(y | x) along with the prior probability 
p(x). In other words, if we are interested in inferring a quantity x from sensor data 
y, Bayes rule allows us to do so through the inverse probability, which specifies the 
probability of data y assuming that x was the case. In robotics, this inverse probability 
is often coined “generative model,” since it describes, at some level of abstraction, 
how state variables X cause sensor measurements Y. 


An important observation is that the denominator of Bayes rule, p(y), does not depend 
on x. Thus, the factor p(y) + in Equations (2.13) and (2.14) will be the same for 


any value x in the posterior p(x | y). For this reason, p(y)~! is often written as a 
normalizer variable, and generically denoted n: 


p(z|v) = nply|2) р(х). (2.15) 


If X is discrete, equations of this type can be computed as follows: 


М2: айхыу = ply |x) p(x) (2.16) 
aux, = > auxgly (2.17) 
AUX gy 
Vr:p(r|y) = ——, (2.18) 
aux, 


where aux,|, and aux, are auxiliary variables. These instructions effectively calculate 
p(x | y), but instead of explicitly computing p(y), they instead just normalize the 
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result. The advantage of the notation in (2.15) lies in its brevity. Instead of explicitly 
providing the exact formula for a normalization constant—which can grow large very 
quickly in some of the mathematical derivations to follow—we simply will use the 
normalizer “n” to indicate that the final result has to be normalized to 1. Throughout 
this book, normalizers of this type will be denoted 7 (or n’, n”, ...). We will freely 
use the same 77 in different equations to denote normalizers, even if their actual values 


are different. 


The expectation of a random variable X is given by 


Ех] = rpl), 


ty 
E 
Il 


fe p(x) dx. (2.19) 


Not all random variables possess finite expectations; however, those that do not are of 
no relevance to the material presented in this book. The expectation is a linear function 
of a random variable. In particular, we have 


ElaX +b] = аЕХ|+Ь (2.20) 
for arbitrary numerical values a and b. The covariance of X is obtained as follows 
Cov[X] = E[X- E|X] = E[X?]|- E[X]? (2.21) 


The covariance measures the squared expected deviation from the mean. As stated 
above, the mean of a multivariate normal distribution V (x; и, X) is и, and its covari- 
ance is >]. 


Another important characteristic of a random variable is its entropy. For discrete ran- 
dom variables, the entropy is given by the following expression: 


H(P) = E[-logp(z)) = — (x) log, p(x). (2.22) 


The concept of entropy originates in information theory. The entropy is the expected 
information that the value of x carries: — log, p(x) is the number of bits required 
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to encode x using an optimal encoding, and p(x) is the probability at which x will 
be observed. In this book, entropy will be used in robotic information gathering, to 
express the information a robot may receive upon executing specific actions. 


Finally, we notice that it is perfectly fine to condition any of the rules discussed so far 
on arbitrary other random variables, such as the variable Z. For example, conditioning 
Bayes rule on Z — z gives us: 


p. PUE uL Е : E (2.23) 


Similarly, we can condition the rule for combining probabilities of independent ran- 
dom variables (2.7) on other variables z: 


р(2,у|2) = р(ш|)р(у|#). (2.24) 


Such a relation is known as conditional independence. As the reader easily verifies, 
(2.24) 1s equivalent to 


p(r|z) = m(x|zw) 
р(у|2) = ply| 2,2) (2.25) 


Conditional independence plays an important role in probabilistic robotics. It applies 
whenever a variable y carries no information about a variable x if another variable’s 
value z is known. Conditional independence does not imply (absolute) independence, 
that is, 


р(2,у |2) = p(x|z) р(у|2) v pv) = р(х) р(у) (2.26) 


The converse is also in general untrue: absolute independence does not imply condi- 
tional independence: 


р(х,у) = р(х) р(у) % р(2,у|2) = р(х| 2) р(у| 2) (2.27) 


In special cases, however, conditional and absolute independence may coincide. 
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Figure 2.1 Robot Environment Interaction. 


2.3 ROBOT ENVIRONMENT 
INTERACTION 


Figure 2.1 illustrates the interaction of a robot with its environment. The environ- 
ment, or world, of a robot is a dynamical system that possesses internal state. The 
robot can acquire information about its environment using its sensors. However, sen- 
sors are noisy, and there are usually many things that cannot be sensed directly. As 
а consequence, the robot maintains an internal belief with regards to the state of its 
environment, depicted on the left in this figure. The robot can also influence its en- 
vironment through its actuators. However, the effect of doing so is often somewhat 
unpredictable. This interaction will now be described more formally. 


2.3.1 State 


Environments are characterized by state. For the material presented in this book, it 
will be convenient to think of state as the collection of all aspects of the robot and 
its environment that can impact the future. State may change over time, such as the 
location of people; or it may remain static throughout the robot's operation, such as 
the location of walls in (most) buildings. State that changes will be called dynamic 
state, Which distinguishes it from static, or non-changing state. The state also includes 
variables regarding the robot itself, such as its pose, velocity, whether or not its sensors 
are functioning correctly, and so on. Throughout this book, state will be denoted z; 
although the specific variables included in x will depend on the context. The state at 
time t will be denoted x+. Typical state variables used throughout this book are: 
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m The robot pose, which comprises its location and orientation relative to а global 
coordinate frame. Rigid mobile robots possess six such state variables, three for 
their Cartesian coordinates, and three for their angular orientation, also called 
Euler angles (pitch, roll, and yaw). For rigid mobile robots confined to planar 
environments, the pose is usually given by three variables, its two location co- 
ordinates in the plane and its heading direction (yaw). The robot pose is often 
referred to as kinematic state. 


m  Theconfiguration of the robot's actuators, such as the joints of robotic manipula- 
tors. Each degree of freedom in a robot arm is characterized by a one-dimensional 
configuration at any point in time, which is part of the kinematic state of the robot. 


m The robot velocity and the velocities of its joints. A rigid robot moving through 
space is characterized by up to six velocity variables, one for each pose variables. 
Velocities are commonly referred to as dynamic state. Dynamic state will play 
only a minor role in this book. 


m  Thelocation and features of surrounding objects in the environment. An object 
may be a tree, a wall, or a pixel within a larger surface. Features of such objects 
may be their visual appearance (color, texture). Depending on the granularity of 
the state that is being modeled, robot environments possess between a few dozen 
and up to hundreds of billions of state variables (and more). Just imagine how 
many bits it will take to accurately describe your physical environment! For many 
of the problems studied in this book, the location of objects in the environment 
will be static. In some problems, objects will assume the form of landmarks, 
which are distinct, stationary features of the environment that can be recognized 
reliably. 


m The location and velocities of moving objects and people. Often, the robot is not 
the only moving actor in its environment. Other moving entities possess their 
own kinematic and dynamic state. 


m There can be a huge number of other state variables. For example, whether or 
not a sensor is broken is a state variable, as is the level of battery charge for a 
battery-powered robot. 


A state x; will be called complete if it is the best predictor of the future. Put differ- 
ently, completeness entails that knowledge of past states, measurements, or controls 
carry no additional information that would help us to predict the future more accu- 
rately. It it important to notice that our definition of completeness does not require the 
future to be a deterministic function of the state. The future may be stochastic, but 
no variables prior to x; may influence the stochastic evolution of future states, unless 
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this dependence is mediated through the state x+. Temporal processes that meet these 
conditions are commonly known as Markov chains. 


The notion of state completeness is mostly of theoretical importance. In practice, it 
is impossible to specify a complete state for any realistic robot system. A complete 
state includes not just all aspects of the environment that may have an impact on the 
future, but also the robot itself, the content of its computer memory, the brain dumps of 
surrounding people, etc. Those are hard to obtain. Practical implementations therefore 
single out a small subset of all state variables, such as the ones listed above. Such a 
state is called incomplete state. 


In most robotics applications, the state is continuous, meaning that x; is defined over a 
continuum. А good example of a continuous state space is that of a robot pose, that is, 
its location and orientation relative to an external coordinate system. Sometimes, the 
state is discrete. An example of a discrete state space is the (binary) state variable that 
models whether or not a sensor is broken. State spaces that contain both continuous 
and discrete variables are called hybrid state spaces. 


In most cases of interesting robotics problems, state changes over time. Time, through- 
out this book, will be discrete, that is, all interesting events will take place at discrete 
time steps 


UD mt. (2.28) 


If the robot starts its operation at a distinct point in time, we will denote this time as 
t=0. 


2.3.2 Environment Interaction 


There are two fundamental types of interactions between a robot and its environment: 
The robot can influence the state of its environment through its actuators. And it can 
gather information about the state through its sensors. Both types of interactions may 
co-occur, but for didactic reasons we will distinguish them throughout this book. The 
interaction is illustrated in Figure 2.1: 


= Sensor measurements. Perception is the process by which the robot uses its 
sensors to obtain information about the state of its environment. For example, 
a robot might take a camera image, a range scan, or query its tactile sensors 
to receive information about the state of the environment. The result of such a 
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perceptual interaction will be called a measurement, although we will sometimes 
also call it observation or percept. Typically, sensor measurements arrive with 
some delay. Hence they provide information about the state a few moments ago. 


= Control actions change the state of the world. They do so by actively asserting 
forces on the robot's environment. Examples of control actions include robot 
motion and the manipulation of objects. Even if the robot does not perform any 
action itself, state usually changes. Thus, for consistency, we will assume that the 
robot always executes a control action, even if it chooses not to move any of its 
motors. In practice, the robot continuously executes controls and measurements 
are made concurrently. 


Hypothetically, a robot may keep a record of all past sensor measurements and control 
actions. We will refer to such a the collection as the data (regardless of whether they 
are being memorized). In accordance with the two types of environment interactions, 
the robot has access to two different data streams. 


= Measurement data provides information about a momentary state of the envi- 
ronment. Examples of measurement data include camera images, range scans, 
and so on. For most parts, we will simply ignore small timing effects (e.g., most 
ladar sensors scan environments sequentially at very high speeds, but we will 
simply assume the measurement corresponds to a specific point in time). The 
measurement data at time t will be denoted 


24 (2.29) 


Throughout most of this book, we simply assume that the robot takes exactly опе 
measurement at a time. This assumption is mostly for notational convenience, as 
nearly all algorithms in this book can easily be extended to robots that can acquire 
variables numbers of measurements within a single time step. The notation 


Zt4:t = Ztis ty +15 1+2 t ‚ te (2.30) 


denotes the set of all measurements acquired from time t; to time £5, for tı < te. 


= Control data carry information about the change of state in the environment. 
In mobile robotics, a typical example of control data is the velocity of a robot. 
Setting the velocity to 10 cm per second for the duration of five seconds suggests 
that the robot's pose, after executing this motion command, is approximately 50 
cm ahead of its pose before command execution. Thus, its main information 
regards the change of state. 
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An alternative source of control data are odometers. Odometers are sensors that 
measure the revolution of a robot's wheels. As such they convey information 
about the change of the state. Even though odometers are sensors, we will treat 
odometry as control data, since its main information regards the change of the 
robot's pose. 


Control data will be denoted u;. The variable и; will always correspond to the 
change of state in the time interval (t — 1; t]. As before, we will denote sequences 
of control data by t, :t3, for tı < t»: 


Unda = Uns Ut 41; Ur 42; s Uta + (2.31) 


Since the environment may change even if a robot does not execute a specific 
control action, the fact that time passed by constitutes, technically speaking, con- 
trol information. Hence, we assume that there is exactly one control data item 
per time step t. 


The distinction between measurement and control is a crucial one, as both types of 
data play fundamentally different roles in the material yet to come. Perception pro- 
vides information about the environment’s state, hence it tends to increase the robot's 
knowledge. Motion, on the other hand, tends to induce a loss of knowledge due to the 
inherent noise in robot actuation and the stochasticity of robot environments; although 
sometimes a control makes the robot more certain about the state. By no means is our 
distinction intended to suggest that actions and perceptions are separated in time, 1.е., 
that the robot does not move while taking sensor measurements. Rather, perception 
and control takes place concurrently; many sensors affect the environment; and the 
separation is strictly for convenience. 


2.3.3 Probabilistic Generative Laws 


The evolution of state and measurements is governed by probabilistic laws. In gen- 
eral, the state at time x, is generated stochastically. Thus, it makes sense to specify the 
probability distribution from which x; is generated. At first glance, the emergence of 
state z, might be conditioned on all past states, measurements, and controls. Hence, 
the probabilistic law characterizing the evolution of state might be given by a proba- 
bility distribution of the following form: 


pee | $031 234 1 Чы) (2.32) 


(Notice that through no particular motivation we assume here that the robot executes 
a control action ш first, and then takes a measurement z1.) However, if the state x is 
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complete then it is a sufficient summary of all that happened in previous time steps. 
In particular, 2,1 is a sufficient statistic of all previous controls and measurements 
up to this point, that 15, u1:+—1 and 21.._1. From all the variables in the expression 
above, only the control и matters if we know the state z; 1. In probabilistic terms, 
this insight is expressed by the following equality: 


p(xe| zou—i.214—1, 014) = р(2 | 201,0) (2.33) 


The property expressed by this equality is an example of conditional independence. It 
states that certain variables are independent of others if one knows the values of a third 
group of variables, the conditioning variables. Conditional independence will be ex- 
ploited pervasively in this book, as it is the main source of tractability of probabilistic 
robotics algorithms. 


Similarly, one might want to model the process by which measurements are being 
generated. Again, if z, is complete, we have an important conditional independence: 


plz | 20:6, Z1:t—1; U1:t) = plz | x) (2.34) 


In other words, the state x+ is sufficient to predict the (potentially noisy) measurement 
Zt. Knowledge of any other variable, such as past measurements, controls or even past 
states, is irrelevant if x; is complete. 


This discussion leaves open as to what the two resulting conditional probabilities are: 
p(x | 1-1, Ue) and p(z | 2). The probability p(z, | 2.1, uz) is the state transition 
probability. It specifies how environmental state evolves over time as a function of 
robot controls u;. Robot environments are stochastic, which is reflected by the fact that 
p(x | 2.1, Uz) is a probability distribution, not a deterministic function. Sometimes 
the state transition distribution does not depend on the time index t, in which case we 
may write it as p(x’ | u, x), where x’ is the successor and x the predecessor state. 


The probability p(z; | x) is called the measurement probability. It also may not 
depend on the time index t, in which case it shall be written as p(z | x). The mea- 
surement probability specifies the probabilistic law according to which measurements 
z are generated from the environment state x. Measurements are usually noisy projec- 
tions of the state. 


The state transition probability and the measurement probability together describe the 
dynamical stochastic system of the robot and its environment. Figure ?? illustrates 
the evolution of states and measurements, defined through those probabilities. The 
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state at time t is stochastically dependent on the state at time t — 1 and the control 
ut. The measurement z; depends stochastically on the state at time t. Such a temporal 
generative model is also known as hidden Markov model (HMM) or dynamic Bayes 
network (DBN). To specify the model fully, we also need an initial state distribution 


p(zo). 


2.3.4 Belief Distributions 


Another key concept in probabilistic robotics is that of a belief. A belief reflects the 
robot's internal knowledge about the state of the environment. We already discussed 
that state cannot be measured directly. For example, a robot's pose might be x = 
(14.12, 12.7,0.755) in some global coordinate system, but it usually cannot know its 
pose, since poses are not measurable directly (not even with GPS!). Instead, the robot 
must infer its pose from data. We therefore distinguish the true state from its internal 
belief, or state of knowledge with regards to that state. 


Probabilistic robotics represents beliefs through conditional probability distributions. 
A belief distribution assigns a probability (or density value) to each possible hypoth- 
esis with regards to the true state. Belief distributions are posterior probabilities over 
state variables conditioned on the available data. We will denote belief over a state 
variable 2; by bel(x;), which is an abbreviation for the posterior 


bel(zxi) = p(x | Zit, Urt). (2.35) 


This posterior is the probability distribution over the state x; at time t, conditioned on 
all past measurements 21. and all past controls u 4. 


The reader may notice that we silently assume that the belief is taken after incorpo- 
rating the measurement 2,. Occasionally, it will prove useful to calculate a posterior 
before incorporating z+, just after executing the control u;. Such a posterior will be 
denoted as follows: 


bel(z) = plx | z14-1, Urt) (2.36) 


This probability distribution is often referred to as prediction in the context of prob- 
abilistic filtering. This terminology reflects the fact that bel(x+) predicts the state at 
time t based on the previous state posterior, before incorporating the measurement 
at time t. Calculating bel(z;) from bel(z;) is called correction or the measurement 
update. 
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2.4 BAYES FILTERS 


2.4.1 The Bayes Filter Algorithm 


The most general algorithm for calculating beliefs is given by the Bayes filter algo- 
rithm. This algorithm calculates the belief distribution bel from measurement and 
control data. We will first state the basic algorithm and elucidate it with a numerical 
example. After that, we will derive it mathematically from the assumptions made so 
far. 


Table 2.1 depicts the basic Bayes filter in pseudo-algorithmic form. The Bayes filter is 
recursive, that is, the belief bel (x+) at time t is calculated from the belief bel(x,_1) at 
time t — 1. Its input is the belief bel at time t — 1, along with the most recent control и; 
and the most recent measurement z+. Its output is the belief bel (x+) at time t. Table 2.1 
only depicts a single step of the Bayes Filter algorithm: the update rule. This update 
rule is applied recursively, to calculate the belief bel(z;) from the belief bel(z,. 1), 
calculated previously. 


The Bayes filter algorithm possesses two essential steps. In Line 3, it processes the 
control u;. It does so by calculating a belief over the state x, based on the prior 
belief over state x;_1 and the control uz. In particular, the belief bel(x+) that the robot 
assigns to state x+ is obtained by the integral (sum) of the product of two distributions: 
the prior assigned to 2; 1, and the probability that control u+ induces a transition from 
23-1 to x+. The reader may recognize the similarity of this update step to Equation 
(2.12). As noted above, this update step is called the control update, or prediction. 


The second step of the Bayes filter is called the measurement update. In Line 4, the 
Bayes filter algorithm multiplies the belief bel(a,) by the probability that the measure- 
ment z; may have been observed. It does so for each hypothetical posterior state x+. 
As will become apparent further below when actually deriving the basic filter equa- 
tions, the resulting product is generally not a probability, that is, it may not integrate 
to 1. Hence, the result is normalized, by virtue of the normalization constant 7. This 
leads to the final belief bel(x,), which is returned in Line 6 of the algorithm. 


To compute the posterior belief recursively, the algorithm requires an initial belief 
bel(aq) at time t = 0 as boundary condition. If one knows the value of x with 
certainty, bel(xo) should be initialized with a point mass distribution that centers all 
probability mass on the correct value of zo, and assigns zero probability anywhere 
else. If one is entirely ignorant about the initial value xo, bel(zo) may be initialized 
using a uniform distribution over the domain of то (or related distribution from the 
Dirichlet family of distributions). Partial knowledge of the initial value xo can be 
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Algorithm Bayes filter(bel(z, 1), ше, 21): 
for all т, do 
bel(z4) = f pla | Ut, 2t) Бе1 (2.1) dx 
bel(zi) = т] p(t | x+) bel(az) 
endfor 
return bel(z,) 


Qv Ux ce actas 


Table 2.1 The general algorithm for Bayes fi Itering. 


expressed by non-uniform distributions; however, the two cases of full knowledge and 
full ignorance are the most common ones in practice. 


The algorithm Bayes filter can only be implemented in the form stated here for very 
simple estimation problems. In particular, we either need to be able to carry out the 
integration in Line 3 and the multiplication in Line 4 in closed form, or we need to 
restrict ourselves to finite state spaces, so that the integral in Line 3 becomes a (finite) 
sum. 


2.4.2 Example 


Our illustration of the Bayes filter algorithm is based on the scenario in Figure 2.2, 
which shows a robot estimating the state of a door using its camera. To make this 
problem simple, let us assume that the door can be in one of two possible states, open 
or closed, and that only the robot can change the state of the door. Let us furthermore 
assume that the robot does not know the state of the door initially. Instead, it assigns 
equal prior probability to the two possible door states: 


bel(Xo = open) = 0.5 (2.37) 
bel( Xo = closed) (2.38) 


Let us furthermore assume the robot's sensors are noisy. The noise is characterized by 
the following conditional probabilities: 


p(Z, —sense.open| X, = is.open) = 0.6 
p(Z, = sense_closed | X, = is.open) = 0.4 (2.39) 
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Figure 2.2 А mobile robot estimating the state of a door. 


and 


p(Z, = sense_open | X, = is.closed) = 0.2 
p(Z, = sense.closed | X, =is-closed) = 0.8 (2.40) 


These probabilities suggest that the robot's sensors are relatively reliable in detecting 
a closed door, in that the error probability is 0.2. However, when the door is open, it 
has a 0.4 probability of a false measurement. 


Finally, let us assume the robot uses its manipulator to push the door open. If the door 
is already open, it will remain open. It it is closed, the robot has a 0.8 chance that it 
will be open afterwards: 


p(X; = is. open | U; = push, X, ; = is.open) 1 
p(X; = is_closed | U; = push, X; і = is.open) = 0 (2.41) 
p(X; = is_open | U; = push, X; ; = is.closed) = 0.8 
p(X; = is_closed | U; = push, X; ; = isclosed) = 0.2 (2.42) 


It can also choose not to use its manipulator, in which case the state of the world does 
not change. This is stated by the following conditional probabilities: 


p(X; = is_open | U; = do. nothing, X, ; = іѕ ореп) = 1 
p(X; = is.closed | U; = do. nothing, X, ; =isopen) = 0 (2.43) 

p(X; = is_open | U; = do_nothing, X, ; = is_closed) = 0 
p(X; = is.closed | U; = do_nothing, X, у = is.closed) = 1 (2.44) 
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Suppose at time f£, the robot takes no control action but it senses an open door. The re- 
sulting posterior belief is calculated by the Bayes filter using the prior belief bel( Xo), 
the control u; — do.nothing, and the measurement sense.open as input. Since 
the state space is finite, the integral in Line 3 turns into a finite sum: 


bel(x1) [oe | u1, £o) bel(xo) ахо 


X pla | u1, o) bel(xo) 


то 


р(21 | U; = do.nothing, Xo = is_open) bel( Xo = is. open) 
+p(a1 | U; = do.nothing, Xo = is.closed) bel( Хо = is.closed) 
(2.45) 


We can now substitute the two possible values for the state variable Ху. For the 
hypothesis X, = is. open, we obtain 


bel(X4 = is. open) 
= p(X, = is_open | U; = do.nothing, Xo = is open) bel( Xo = is open) 


+р(Х = is. open | U, = do. nothing, Хо = is.closed) bel( Xo = is.closed) 


Likewise, for X, = is. closed we get 


bel( X4 = is.closed) 


= 1-05+0-05 = 0.5 (2.46) 


= p(X, = is_closed | U, = do_nothing, Xo = is_open) bel( Xy = is. open) 
+р(Х = is_closed | U; = do nothing, Xo = is closed) bel(Xo = is closed) 


0-05+1-0.5 = 0.5 


The fact that the belief bel(x1) equals our prior belief bel(ao) should not surprise, as 
the action do_nothing does not affect the state of the world; neither does the world 
change over time by itself in our example. 


Incorporating the measurement, however, changes the belief. Line 4 of the Bayes filter 
algorithm implies 


bel(zj) = 1 p(Z, =sense_open | z1) bel(zi) . (2.48) 


(2.47) 
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For the two possible cases, Ху = is open and X, = is. closed, we get 
bel( X1 = is. open) 
= mp(Zi- sense open | X, = is open) bel( X; = is open) 
= m0.6.0.5 = 70.3 (2.49) 
and 
bel( X4 — is.closed) 
= mp(Zi- ѕепѕе ореп | X, = is.closed) bel( X = is.closed) 
= 70.2-0.5 = 70.1 (2.50) 
The normalizer 77 is now easily calculated: 
n = (03401)! = 25 (2.51) 


Hence, we have 


bel(X, —is.open) = 0.75 
bel(X, —is.closed) = 0.25 (2.52) 


This calculation is now easily iterated for the next time step. As the reader easily 
verifies, for ио = push and zz = sense open we get 


1-0.75+0.8-0.25 = 0.95 
bel(X2 —is.closed) = 0-0.75+0.2-0.25 = 0.05, (2.53) 


bel( X5 = is_open) 


and 


bel(Xo =isopen) = 10.6: 0.95 
bel( X5 = is.closed) 


0.983 
n 0.2.0.05 z 0.017. (2.54) 


Q 


? 


At this point, the robot believes that with 0.983 probability the door is open, hence 
both its measurements were correct. At first glance, this probability may appear to be 
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sufficiently high to simply accept this hypothesis as the world state and act accord- 
ingly. However, such an approach may result in unnecessarily high costs. If mistaking 
a closed door for an open one incurs costs (e.g., the robot crashes into a door), con- 
sidering both hypotheses in the decision making process will be essential, as unlikely 
as one of them may be. Just imagine flying an aircraft on auto pilot with a perceived 
chance of 0.983 for not crashing! 


2.4.3 Mathematical Derivation of the Bayes 
Filter 


The correctness of the Bayes filter algorithm is shown by induction. To do so, we need 
to show that it correctly calculates the posterior distribution p(z; | 21:4, u14) from the 
corresponding posterior one time step earlier, р(ж,—1 | 214—1, 14-1). The correct- 
ness follows then by induction under the assumption that we correctly initialized the 
prior belief bel(xg) at time t = 0. 


Our derivation requires that the state x; is complete, as defined in Section 2.3.1, and 
it requires that controls are chosen at random. The first step of our derivation involves 
the application of Bayes rule (2.23) to the target posterior: 


plz | Tt, Z1:t—1; U1:) p(zi | Zi-1,U1:) 
plz | 214—1, U1:t) 


= n (zu | Lt, #1д—1,Ч1:) p(x | 21:4—1, U1:t) (2.55) 


p(zi | Zit Ча) = 


We now exploit the assumption that our state is complete. In Section 2.3.1, we defined 
a state x, to be complete if no variables prior to x, may influence the stochastic evo- 
lution of future states. In particular, if we (hypothetically) knew the state x; and were 
interested in predicting the measurement z+, no past measurement or control would 
provide us additional information. In mathematical terms, this is expressed by the 
following conditional independence: 


p(zi | Tt, Bie ds Чы) = p(zi | Lt) . (2.56) 


Such a statement is another example of conditional independence. It allows us to 
simplify (2.55) as follows: 


p(zi | Zi Чы) = n p(z | Lt) p(zi | Zia) Urt) (2.57) 
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and hence 
bel(r;)) = mp(z |a) bel(az) (2.58) 


This equation is implemented in Line 4 of the Bayes filter algorithm in Table 2.1. 


Next, we expand the term bel(z,), using (2.12): 


bel(a+) p(xe | zai, Чы) 


[oe | 211, Zi:t—1; Ut) p(zia | ziii) dai (2.59) 


Once again, we exploit the assumption that our state is complete. This implies if we 
know 2,1, past measurements and controls convey no information regarding the state 
тү. This gives us 


p(zi | Dy iE piat) = p(x | 2,11, Ut) (2.60) 


Here we retain the control variable u+, since it does not predate the state x;_1. Finally, 
we note that the control u; can safely be omitted from the set of conditioning variables 
in p(ziài | z14—1, 014) for randomly chosen controls. This gives us the recursive 
update equation 


bel(z;) = ] | T, 1,U1) p(zi 1 | 2—1, U1:t 1) dai (2.61) 


As the reader easily verifies, this equation is implemented by Line 3 of the Bayes 
filter algorithm in Table 2.1. To summarize, the Bayes filter algorithm calculates the 
posterior over the state x; conditioned on the measurement and control data up to time 
t. The derivation assumes that the world is Markov, that is, the state is complete. 


Any concrete implementation of this algorithm requires three probability distributions: 
The initial belief p(zo), the measurement probability p(z; | x+), and the state transition 
probability p(z, | ut, 2.1). We have not yet specified these densities, but will do so 
in later chapters (Chapters 5 and ??). Additionally, we also need a representation for 
the belief bel(z,), which will also be discussed further below. 


30 CHAPTER 2 


2.4.4 The Markov Assumption 


A word is in order on the Markov assumption, or the complete state assumption, since 
it plays such a fundamental role in the material presented in this book. The Markov 
assumption postulates that past and future data are independent if one knows the cur- 
rent state тү. To see how severe an assumption this is, let us consider our example 
of mobile robot localization. In mobile robot localization, x; is the robot's pose, and 
Bayes filters are applied to estimate the pose relative to a fixed map. The following 
factors may have a systematic effect on sensor readings. Thus, they induce violations 
of the Markov assumption: 


= Unmodeled dynamics in the environment not included in x; (e.g., moving people 
and their effects on sensor measurements in our localization example), 


ш inaccuracies in the probabilistic models p(z; | £+) and р(х; | we, 2.1), 


m approximation errors when using approximate representations of belief functions 
(e.g., grids or Gaussians, which will be discussed below), and 


m software variables in the robot control software that influence multiple control se- 
lection (e.g., the variable "target location" typically influences an entire sequence 
of control commands). 


In principle, many of these variables can be included in state representations. How- 
ever, incomplete state representations are often preferable to more complete ones to 
reduce the computational complexity of the Bayes filter algorithm. In practice Bayes 
filters have been found to be surprisingly robust to such violations. As a general rule 
of thumb, one has to exercise care when defining the state x+, so that the effect of 
unmodeled state variables has close-to-random effects. 


2.5 REPRESENTATION AND 
COMPUTATION 


In probabilistic robotics, Bayes filters are implemented in several different ways. As 
we will see in the next two chapters, there exist quite a variety of techniques and 
algorithms that are all derived from the Bayes filter. Each such technique relies on dif- 
ferent assumptions regarding the measurement and state transition probabilities and 
the initial belief. Those assumptions then give rise to different types of posterior dis- 
tributions, and the algorithms for computing (or approximating) those have different 
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computational characteristics. As a general rule of thumb, exact techniques for cal- 
culating beliefs exist only for highly specialized cases; in general robotics problems, 
beliefs have to be approximated. The nature of the approximation has important ram- 
ifications on the complexity of the algorithm. Finding a suitable approximation is 
usually a challenging problem, with no unique best answer for all robotics problems. 


When choosing an approximation, one has to trade off a range of properties: 


1. Computational efficiency. Some approximations, such as linear Gaussian ap- 
proximations that will be discussed further below, make it possible to calculate 
beliefs in time polynomial in the dimension of the state space. Others may re- 
quire exponential time. Particle based techniques, discussed further below, have 
an any-time characteristic, enabling them to trade off accuracy with computa- 
tional efficiency. 


2. Accuracy of the approximation. Some approximations can approximate a 
wider range of distributions more tightly than others. For example, linear Gaus- 
sian approximations are limited to unimodal distributions, whereas histogram 
representations can approximate multi-modal distributions, albeit with limited 
accuracy. Particle representations can approximate a wide array of distributions, 
but the number of particles needed to attain a desired accuracy can be large. 


3. Ease of implementation. The difficulty of implementing probabilistic algo- 
rithms depends on a variety of factors, such as the form of the measurement 
probability p(z; | ж) and the state transition probability р(х; | ut, 2,1). Parti- 
cle representations often yield surprisingly simple implementations for complex 
nonlinear systems—one of the reasons for their recent popularity. 


The next two chapters will introduce concrete implementable algorithms, which fare 
quite differently relative to the criteria described above. 


2.6 SUMMARY 


In this section, we introduced the basic idea of Bayes filters in robotics, as a means to 
estimate the state of an environment (which may include the state of the robot itself). 


m The interaction of a robot and its environment is modeled as a coupled dynamical 
system, in which the robot can manipulate its environment by choosing controls, 
and in which it can perceive its environment through sensor measurements. 
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In probabilistic robotics, the dynamics of the robot and its environment are char- 
acterized in the form of two probabilistic laws: the state transition distribution, 
and the measurement distribution. The state transition distribution characterizes 
how state changes over time, possible as the effect of a robot control. The mea- 
surement distribution characterizes how measurements are governed by states. 
Both laws are probabilistic, accounting for the inherent uncertainty in state evo- 
lution and sensing. 


The belief of a robot is the posterior distribution over the state of the environ- 
ment (including the robot state), given all past sensor measurements and all past 
controls. The Bayes filter is the principal algorithm for calculating the belief in 
robotics. The Bayes filter is recursive; the belief at time t is calculated from the 
belief at time t — 1. 


The Bayes filter makes a Markov assumption that specifies that the state is a 
complete summary of the past. This assumption implies the belief is sufficient 
to represent the past history of the robot. In robotics, the Markov assumption 
is usually only an approximation. We identified conditions under which it is 
violated. 


Since the Bayes filter is not a practical algorithm, in that it cannot be imple- 
mented on a digital computer, probabilistic algorithms use tractable approxima- 
tions. Such approximations may be evaluated according to different criteria, re- 
lating to their accuracy, efficiency, and ease of implementation. 


The next two chapters discuss two popular families of recursive state estimation tech- 
niques that are both derived from the Bayes filter. 


2.7 BIBLIOGRAPHICAL REMARKS 


GAUSSIAN FILTERS 


3.1 INTRODUCTION 


This chapter describes an important family of recursive state estimators, collectively 
called Gaussian Filters. Historically, Gaussian filters constitute the earliest tractable 
implementations of the Bayes filter for continuous spaces. They are also by far the 
most popular family of techniques to date—despite a number of shortcomings. 


Gaussian techniques all share the basic idea that beliefs are represented by multivariate 
normal distributions. We already encountered a definition of the multivariate normal 
distribution in Equation (2.4), which is restated here: 


p(r) = det (223) 3 exp (- (zr — WTE! (x — p) (3.1) 


This density over the variable x is characterized by two sets of parameters: The mean и 
and the covariance У. The mean џ is a vector that possesses the same dimensionality 
as the state x. The covariance is a quadratic matrix that is symmetric and positive- 
semidefinite. Its dimension is the dimensionality of the state z squared. Thus, the 
number of elements in the covariance matrix depends quadratically on the number of 
elements in the state vector. 


The commitment to represent the posterior by a Gaussian has important ramifications. 
Most importantly, Gaussians are unimodal, that is, they posses a single maximum. 
Such a posterior is characteristic of many tracking problems in robotics, in which the 
posterior is focused around the true state with a small margin of uncertainty. Gaus- 
sian posteriors are a poor match for many global estimation problems in which many 
distinct hypotheses exist, each of which forming its own mode in the posterior. 
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The representation of a Gaussian by its mean and covariance is called the moments 
representation. 'This is because the mean and covariance are the first and second mo- 
ments of a probability distribution; all other moments are zero for normal distributions. 
In this chapter, we will also discuss an alternative representation, called canonical rep- 
resentation, or sometimes natural representation. Both representations, the moments 
and the canonical representations, are functionally equivalent in that a bijective map- 
ping exists that transforms one into the other (and back). However, they lead to filter 
algorithms with orthogonal computational characteristics. 


This chapter introduces the two basic Gaussian filter algorithms. 


m Section 3.2 describes the Kalman filter, which implements the Bayes filter using 
the moments representation for a restricted class of problems with linear dynam- 
ics and measurement functions. 


m Тһе Kalman filter is extended to nonlinear problems in Section 3.3, which de- 
scribes the extended Kalman filter. 


m Section 3.4 describes the information filter, which is the dual of the Kalman filter 
using the canonical representation of Gaussians. 


3.2 THE KALMAN FILTER 


3.2.1 Linear Gaussian Systems 


Probably the best studied technique for implementing Bayes filters is the Kalman filter 
(KF). The Kalman filter was invented in the 1950s by Rudolph Emil Kalman, as a 
technique for filtering and prediction in linear systems. The Kalman filter implements 
belief computation for continuous states. It is not applicable to discrete or hybrid state 
spaces. 


The Kalman filter represents beliefs by the moments representation: At time t, the 
belief is represented by the the mean u+ and the covariance >. Posteriors are Gaussian 
if the following three properties hold, in addition to the Markov assumptions of the 
Bayes filter. 


1. The next state probability p(x: | ut, z;..1) must be a linear function in its argu- 
ments with added Gaussian noise. This is expressed by the following equation: 


2р = Aq 4 + Ви Fet: (3.2) 
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Here x; and т,—1 are state vectors, and и; is the control vector at time t. In our 
notation, both of these vectors are vertical vectors, that is, they are of the form 


Ti,t U1,t 
T2.t u2.t 

ть = A and ut = . . (3.3) 
Tn,t Um,t 


A, and B; are matrices. A, is a square matrix of size n x n, where n is the 
dimension of the state vector x+. D; is of size n x m, with m being the dimension 
of the control vector u;. By multiplying the state and control vector with the 
matrices А, and В,, respectively, the state transition function becomes linear in 
its arguments. Thus, Kalman filters assume linear system dynamics. 


The random variable =; in (3.2) is a Gaussian random vector that models the ran- 
domness in the state transition. It is of the same dimension as the state vector. Its 
mean is zero and its covariance will be denoted R+. A state transition probability 
of the form (3.2) is called a linear Gaussian, to reflect the fact that it is linear in 
its arguments with additive Gaussian noise. 


Equation (3.2) defines the state transition probability р(2 | ut, 2.1). This prob- 
ability is obtained by plugging Equation (3.2) into the definition of the multi- 
variate normal distribution (3.1). The mean of the posterior state is given by 
Духу 1 + Ви and the covariance by Ry: 


p(z« | Ut, 241) (3.4) 
1 
— det (21R,) 2 exp (iQ: m At = Bou)? Ry (o, = At == Ви) } 


2. The measurement probability p(z; | x+) must also be linear in its arguments, with 
added Gaussian noise: 


ep = Cyr + 0; . (3.5) 


Here С, is a matrix of size k x n, where k is the dimension of the measurement 
vector z;. The vector б, describes the measurement noise. The distribution of 6, 
is a multivariate Gaussian with zero mean and covariance Q+. The measurement 
probability is thus given by the following multivariate normal distribution: 


1 
p(ze| ve) = Яе (276) 2 exp {—3(z — С.а) Qr! (4 — € х,)} 
(3.6) 
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Algorithm Kalman filter(j,. 1,34 1, Ut, 2): 
fir = А ш-1 + Be ш 
У, = А, У, 1 AP +В 
Ky = Xt CP (Cy Ў, CF + Qe)! 
ш = Шш + Ku — б, jit) 
У, = (I-K: Ci) 23g 
return ре, Xt 


SANA do ра 


Table 3.1 The Kalman fi lter algorithm for linear Gaussian state transitions and measure- 
ments. 


3. Finally, the initial belief bel(xo) must be normal distributed. We will denote the 
mean of this belief by ие and the covariance by No: 


bel(zo) = р(х) = det (27) 2 exp [—1 (zo — no) X! (zo — но)} 


These three assumptions are sufficient to ensure that the posterior bel(z,) is always 
a Gaussian, for any point in time t. The proof of this non-trivial result can be found 
below, in the mathematical derivation of the Kalman filter (Section 3.2.4). 


3.2.2 The Kalman Filter Algorithm 


The Kalman filter algorithm is depicted in Table 3.1. Kalman filters represent the 
belief bel (x+) at time t by the mean ju, and the covariance X. The input of the Kalman 
filter is the belief at time t — 1, represented by ш; and 3 ,. To update these 
parameters, Kalman filters require the control u; and the measurement z;. The output 
is the belief at time t, represented by u+ and X. 


In Lines 2 and 3, the predicted belief j; and X is calculated representing the belief 
bel (x+) one time step later, but before incorporating the measurement z+. This belief is 
obtained by incorporating the control u;. The mean is updated using the deterministic 
version of the state transition function (3.2), with the mean шг: substituted for the 
state z; 1. The update of the covariance considers the fact that states depend on 
previous states through the linear matrix A+. This matrix is multiplied twice into the 
covariance, since the covariance is a quadratic matrix. 
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The belief bel(;) is subsequently transformed into the desired belief bel(z,) in Lines 
4 through 6, by incorporating the measurement z;. The variable Кү, computed in 
Line 4 is called Kalman gain. It specifies the degree to which the measurement is 
incorporated into the new state estimate. Line 5 manipulates the mean, by adjusting 
it in proportion to the Kalman gain K, and the deviation of the actual measurement, 
24, and the measurement predicted according to the measurement probability (3.5). 
Finally, the new covariance of the posterior belief is calculated in Line 6, adjusting for 
the information gain resulting from the measurement. 


The Kalman filter is computationally quite efficient. For today's best algorithms, the 
complexity of matrix inversion is approximately O(d?5) for a matrix of size d x d. 
Each iteration of the Kalman filter algorithm, as stated here, is lower bounded by 
(approximately) O(k?:5), where k is the dimension of the measurement vector z+. This 
(approximate) cubic complexity stems from the matrix inversion in Line 4. It is also at 
least in O(n?), where n is the dimension of the state space, due to the multiplication 
in Line 6 (the matrix K,C, may be sparse). In many applications—such as the robot 
mapping applications discussed in later chapters—-the measurement space is much 
lower dimensional than the state space, and the update is dominated by the O(n?) 
operations. 


3.2.3 Illustration 


Figure 3.2 illustrates the Kalman filter algorithm for a simplistic one-dimensional lo- 
calization scenario. Suppose the robot moves along the horizontal axis in each diagram 
in Figure 3.2. Let the prior over the robot location be given by the normal distribution 
shown in Figure 3.2a. The robot queries its sensors on its location (e.g., a GPS sys- 
tem), and those return a measurement that is centered at the peak of the bold Gaussian 
in Figure 3.2b. This bold Gaussian illustrates this measurement: Its peak is the value 
predicted by the sensors, and its width (variance) corresponds to the uncertainty in the 
measurement. Combining the prior with the measurement, via Lines 4 through 6 of 
the Kalman filter algorithm in Table 3.1, yields the bold Gaussian in Figure 3.2c. This 
belief's mean lies between the two original means, and its uncertainty radius is smaller 
than both contributing Gaussians. The fact that the residual uncertainty is smaller than 
the contributing Gaussians may appear counter-intuitive, but it is a general character- 
istic of information integration in Kalman filters. 


Next, assume the robot moves towards the right. Its uncertainty grows due to the fact 
that the next state transition is stochastic. Lines 2 and 3 of the Kalman filter provides 
us with the Gaussian shown in bold in Figure 3.2d. This Gaussian is shifted by the 
amount the robot moved, and it is also wider for the reasons just explained. Next, the 
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Figure 3.2 Illustration of Kalman filters: (a) initial belief, (b) a measurement (in bold) 
with the associated uncertainty, (c) belief after integrating the measurement into the belief 
using the Kalman filter algorithm, (d) belief after motion to the right (which introduces 
uncertainty), (e) a new measurement with associated uncertainty, and (f) the resulting belief. 
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robot receives a second measurement illustrated by the bold Gaussian in Figure 3.2e, 
which leads to the posterior shown in bold in Figure 3.2f. 


As this example illustrates, the Kalman filter alternates a measurement update step 
(Lines 5-7), in which sensor data is integrated into the present belief, with a prediction 
step (or control update step), which modifies the belief in accordance to an action. 
The update step decreases and the prediction step increases uncertainty in the robot's 
belief. 


3.2.4 Mathematical Derivation of the КЕ 


This section derives the Kalman filter algorithm in Table 3.1. The section can safely 
be skipped at first reading. 


Part 1: Prediction. Our derivation begins with Lines 2 and 3 of the algorithm, in 
which the belief bel (x+) is calculated from the belief one time step earlier, bel(z,. 1). 
Lines 2 and 3 implement the update step described in Equation (2.61), restated here 
for the reader's convenience: 


bel(z,) = / p(x | zu, ud) bel(z 1) daz 4 (3.7) 
jee —— 


AN (mis Asia Bois, Fa) ~N (xiciipici, Sai) 


The “prior” belief bel(x,_1) is represented by the mean 4.1 and the covariance X, 1. 
The state transition probability p(z; | 24-1, uz) was given in (3.4) as a normal distri- 
bution over x; with mean A;z;. + Brus and covariance R;. As we shall show now, 
the outcome of (3.7) is again a Gaussian with mean ji, and covariance X; as stated in 
Table 3.1. 


We begin by writing (3.7) in its Gaussian form: 


bel(z,) — =з] exp {— 5 (т — At 24-1 — В ш)? m (zi — At zi-1 — Bi ut) } 
( 


exp 1—1 (201 — pa) £l (a — шт) } dzca. (3.8) 


In short, we have 


bel(z;) = m [5922 rn (3.9) 
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with 


i = i (a, — Ay $11 — В, шщ)? Rag? (a, — Ay 241 — By u) 
+ 1 (201 = oe 1)? Ecl (zia — ina). (3.10) 


Notice that L+ is quadratic in 2,1; it is also quadratic in x+. 


Expression (3.9) contains an integral. To solve this integral in closed form, we will 
now decompose L into two functions, L;(zi 1, 24) and (ш): 


Le = Lzxnim)oL) (3.11) 


so that all terms containing 2,1 are collected іп L;(z; 1,24). This decomposition 
will allow us to move L;(z;) outside the integration, since its value does not depend 
on the integration variable x,_ 1: 


bel(zi) = n f exp{-Le} dx 


= m J Soto) = Li(zi)). ат 


= mexpí-lLi(zi)] 1-а) da, 4 (3.12) 


Furthermore, we will choose L;(z;. 1, x+) such that the value of the integral in (3.12) 
does not depend on x+. Thus, the integral will simply become a constant relative to the 
problem of estimating the belief distribution over x+. The resulting distribution over 
ж Will be entirely defined through L;(z;). 


bel(z; = n exp{—L;(z,)} (3.13) 


Let us now perform this decomposition. We are seeking a function L(z, 1,1) 
quadratic in z,..,. (This function will also depend on z+, but that shall not concern us 
at this point.) To determine the coefficients of this quadratic, we calculate the first two 
derivatives of L+: 


OL 


Эг = —AIRQ (m — Арыт — By шщ) + Dy (а—1 — ра) (3.14) 
ET 
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0? Lı 


2 
Or? у 


= A R AS yee eM (3.15) 


V, defines the curvature of L,(zi 1, £+). Setting the first derivative of L; to 0 gives 
us the mean: 


АТ Вг! '(z;— Ae te-1 В, ue) = XQ (mia ua) (3.16) 


This expression is now solved for z,. 1 


AT В! (te= Bi u) — AT В! Åt tt = NW ws i-Xpha 1 
AT В! Ак 2411 eet Tg = AT Ret (a, — By uz) + PUER me 
(AP Rz А. HEH) тр = AP Ret (ee — Bi w) + Uy ua 
Wola = ADR (t, Bi w) +EH na 

T1 = ©, [AP Ry! (zi — Be ue) + Xt m1] (3.17) 


11111 


Thus, we now have a quadratic function Г, (2,1, x+), defined as follows: 


L(ziizx) = (21 — V [AP В! (x — Be we) YXQ wap v 
(201 — V, [AT ВС (ж — Bi ut) + Уу ш—1]) (3.18) 


Clearly, this is not the only quadratic function satisfying our decomposition in (3.11). 
However, L;(zi1, £+) is of the common quadratic form of the negative exponent of a 
normal distribution. In fact the function 


Де ОШ 5 exer doa voy (3.19) 


is a valid probability density function (PDF) for the variable x,_1. As the reader easily 
verifies, this function is of the form defined in (3.1). We know from (2.5) that PDFs 
integrate to 1. Thus, we have 


1 
J aero exp(—L,(zx;.1,2;)) ах 1 = 1 (3.20) 
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From this it follows that 
1 
/ ехр{—[л(ж—1,х)} day_y = Че{(2тФ)2. (3.21) 


The important thing to notice is that the value of this integral is independent of £+, our 
target variable. Thus, for our problem of calculating a distribution over x+, this inte- 
gral is constant. Subsuming this constant into the normalizer 7, we get the following 
expression for Equation (3.12): 


bel(zi) = т exp{—Li(z1)} 1-а.) dx, 
= mexpí-L((x)) (3.22) 


Notice that the normalizers т) left and right of the equal sign are not the same. This 
decomposition establishes the correctness of (3.13). 


It remains to determine the function L;(z;), which is the difference of L,, defined in 
(3.10), and L,(z.1, £+), defined in (3.18): 


Liz) = h- Li(2i-1, v1) 

i (x; — At zii — В, uj)? Ву (ae А, zia — By ut) 

+ i (zi-1— pea)? XE (ria — a) 

-i(zi- ©, [AP Re? (ee — Be ut) + XC uw! 
(201 — ©, [AP Ду! (ae — By ut) + X£3 ш—1]) (3.23) 


Let us quickly verify that L,(r;) indeed does not depend on 2:1. To do so, we 
substitute back V, = (AT ВГ! A, + X )- , and multiply out the terms above. For 
the reader's convenience, terms that contain 2:;_ are underlined (doubly if they are 
quadratic in z;. 1). 


Г.(21) = 5 zl АТ В! At Ct-1 — rT AF d (2; т В; ш) 


+3 (x T В; uz)? Re? (2, = В; ш) 


1,T у-1 T yl 1 T у-1 
+5 95a M vici — Xia Ya puoi +5 Hii Yii Ht- 


-i тү (AP Ry! Ar +EH) ti 
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tal, [AP Ry? (а — Bi ut) + Ezt ш] 
—} [AP Ry? (a, — By u) + 1, weal’ (AP Вр ARX) 
[AT Ry! (x; — Be ui) + t3 pa] (3.24) 


It is now easily seen that all terms that contain x;_, cancel out. This should come at 
no surprise, since it is a consequence of our construction of L;(z, 1, £t). 


Lila) = +4 (te — Be us)? В! (xi — Be ue) 3 ia Xa ш—1 
—3 [Ar Ry” (zi — Be ue) + Ezi meal? (AP Ry? А+) 
[AT Вг! (x = В, ш) + x Ia] (3.25) 


Furthermore, Гг, (2;) is quadratic in тү. This observation means that bel(z;) is indeed 
normal distributed. The mean and covariance of this distribution are of course the 
minimum and curvature of L;(z,), which we now easily obtain by by computing the 
first and second derivatives of L(x;) with respect to x+: 


OL, (x+) 
Ox 


= Re (x = В, ш) — В! A, (AT В! A, + DE 
[AP Ry" (z, — By ui) + X43, nia] 

= [R;||—RSU At (AP Rpt А + XQ) AP RU] (zi — Bru) 
cuu Ap (АРЕ КАЧЫ) EXC ea (3.26) 


The inversion lemma stated (and proved) in Table 3.2 allows us to express the first 
factor as follows: 


Еу Ер A (Ap Ry? Ae + Ep) АР Ry) = (Re + As Dea Ap) 
(3.27) 


Hence the desired derivative is given by the following expression: 


OL, (x+) 
дт; 


= (В, + At Xi АГ)! (2; = В, ш) 


-R"A (AT Re ARS У) T S a (3.28) 
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with appropriate dimension, the following holds true 


Proof. It suffices to show that 


This is shown through a series of transformations: 


zu 


-I-ERH!P|IQP'- QP") = I 
—M— 


=0 


assuming that all above matrices can be inverted as stated. 


= (Q^! NS pr R^! py! pt Ro PQ РТ] 
= I-RCPIQP'-(Q'^-P'R'Py'(Q'-P'R^P)QPT| 
—————MÁ——— 


=I 


Inversion Lemma. For any invertible quadratic matrices R and Q and any matrix P 


(R--PQP')! = R- R P (Q7! + PT R! P)! PT R! 


(R! -— R! P (QE + PT R P)! P” R) (R+PQP") = I 


же —1 —1 E —1 —1 T —1 —1 T —1 
= RO BR RO POR? — ROP (Ol +P RP) Pt RR 


= R`! P (Q7! + PT ВР) PT R! PQ PY 
= I-RCOPQP = R! P(Q*+P' R P) PT 
В P (Q7! + PY R`! P) PT R! PQ PY 
= I+ R P|QPT – (Q7! + PT R! P)! PT 
_ (Q7! 4 PT R! py РТ R! PQ PT] 
= I+ R'P[QPT eO acp R P) Q"Q P 


mu 


М2 
at 


Table 3.2 The (specialized) inversion lemma. 


The minimum of Г; (2) is attained when the first derivative is zero. 


(Ri + А, У, АТ) (a, — By us) = Вг) Ay (AP Ret ALEX, 


14-1 y-1 
21) Уру Mea 


(3.29) 
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Solving this for the target variable x; gives us the surprisingly compact result 


T —1 T 1 1—1 1—1 
т = Brut (Rit Ar Xi- А), А, (А, В, At + 5,1) Era ua 
——— ——— 
AcrAY a ATRQ А, (QL-iAT Ry! Ae) 


В, Ut T At (I+ Xii ATR! At) [DAT Ret At + D! Ht—1 
—$$— 0—————————___, 


=I 
Bi u + At Ht—1 (3.30) 


Thus, the mean of the belief bel(zi) after incorporating the motion command u+ is 
Dy ut + Ак ш—1. This proves the correctness of Line 2 of the Kalman filter algorithm 
in Table 3.1. Line 3 is now obtained by calculating the second derivative of (жу): 


0? Li (2+) 


7 v (Ri + At Dt-1 Ap)" (3.31) 


This is the curvature of the quadratic function L;(x;), whose inverse is the covariance 
of the belief bel (x+). 


To summarize, we showed that the prediction steps in Lines 2 and 3 of the Kalman 
filter algorithm indeed implement the Bayes filter prediction step. To do so, we first 
decomposed the exponent of the belief bel(x;) into two functions, L;(x4~1, £+) and 
І.(2,). Then we showed that L;(a;—1,2;) changes the predicted belief bel(x+) only 
by a constant factor, which can be subsumed into the normalizing constant 7. Finally, 
we determined the function L;(r,) and showed that it results in the mean р; and 
covariance X, of the Kalman filter prediction bel(z;). 


Part 2: Measurement Update. We will now derive the measurement update in 
Lines 4, 5, and 6 (Table 3.1) of our Kalman filter algorithm. We begin with the general 
Bayes filter mechanism for incorporating measurements, stated in Equation (2.58) and 
restated here in annotated form: 


bel(zz) = m plz | 22) bel(x+) (3.32) 
—— —— 
ON (Cim Qu). ~N (mese 94) 


The mean and covariance of bel(x+) are obviously given by ji; and X;. The measure- 
ment probability p(z; | x+) was defined in (3.6) to be normal as well, with mean C; x; 
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and covariance Q+. Thus, the product is given by an exponential 

bel(z;) = техр{—%} (3.33) 
with 

Љ = $( О)" Qj (ж — Cii) +5 (i — p)? Уу (ae — pi3.34) 


This function is quadratic in x+, hence bel(z,) is a Gaussian. To calculate its parame- 
ters, we once again calculate the first two derivatives of J; with respect to x+: 


0J = 

as = OF QU (n – Com) + BE" (2: – №) Com 
t 

Ps Es e 

дт? = C; Qi ! Cet n» , (3.36) 
t 


The second term is the inverse of the covariance of bel (x+): 
E. (CF Qr' О,+ Г)! (3.37) 


The mean of bel(x;) is the minimum of this quadratic function, which we now calcu- 
late by setting the first derivative of J; to zero (and substituting u+ for x+): 


OF Qr (2 Сш) = Ey! (ue — Fe) (3.38) 
The expression on the left of the equal sign can be transformed as follows: 


СЕ Qi! (а — Come) 
= OF Qi (z — С, ш + Ct ii — Ct De) 
= OLQi (a — Cii) — CP QUI Ci (ш — №) (3.39) 


Substituting this back into (3.38) gives us 


СЕ О! (4-б) = (CP QU CO XL) (ue — no) (3.40) 
м у ————7 


—1 
=X, 
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and hence we have 
dt Cr Әг (2-С) = pe ie 


We now define the Kalman gain as 


К, = XO Qr' 
and obtain 
ш = fie + Ky (2 — Cy а) 
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(3.41) 


(3.42) 


(3.43) 


This proves the correctness of Line 5 in the Kalman filter algorithm in Table 3.1. 


The Kalman gain, as defined in (3.42), is a function of >. This is at odds with the 
fact that we utilize К, to calculate > in Line 6 of the algorithm. The following 
transformation shows us how to expresses K, in terms of covariances other than X. 
It begins with the definition of X; in (3.42): 


К, 


Daya Obes ON 


ECP OF (Cr БОД (ОЗОТ Ог! 
————M———— 


=F 


£; (СТ Oe Ci Ùe CT + OT Ог Qi) (Ci 3:07 + О). 


= І 


D(C O O UO uL C EOD 
D(C O OO EE DOCU ED qe 
—— 


ZI 


SEP OF Get Set) Sy CPG, Ss; CF 4-9) 
———— m 


= $3 
De Dy) X OF (Ci Xi CE + 0)! 
Se 


= І 


Or OSOE FO) (3.44) 


This expression proves the correctness of Line 4 of our Kalman filter algorithm. Line 
6 is obtained by expressing the covariance using the Kalman gain Кү. The advantage 


48 CHAPTER 3 


of the calculation in Table 3.1 over the definition in Equation (3.37) lies in the fact that 
we can avoid inverting the state covariance matrix. This is essential for applications 
of Kalman filters to high-dimensional state spaces. 


Our transformation is once again carried out using the inversion lemma, which was 
already stated in Table 3.2. Here we restate it using the notation of Equation (3.37): 


(3-0 QUO)! = 5-С (Qi + Ci X 0) Ci Ў, (3.45) 
This lets us arrive at the following expression for the covariance: 


X. = (CP Or б +)! 

ў. 5, CP (Qi + Ci 5, CP)? С, Ў, 

[1 —% СЕ (Qi + С, 5 ОТ)! Ci] De 

(I -— K; С) Ў, (3.46) 


This proves the correctness of Line 6 of our Kalman filter algorithm. 


3.3 THE EXTENDED KALMAN FILTER 


The assumptions of linear state transitions and linear measurements with added Gaus- 
sian noise are rarely fulfilled in practice. For example, a robot that moves with constant 
translational and rotational velocity typically moves on a circular trajectory, which 
cannot be described by linear next state transitions. This observation, along with the 
assumption of unimodal beliefs, renders plain Kalman filters, as discussed so far, in- 
applicable to all but the most trivial robotics problems. 


The extended Kalman filter (EKF) overcomes one of these assumptions: the linearity 
assumption. Here the assumption is that the next state probability and the measure- 
ment probabilities are governed by nonlinear functions g and h, respectively: 


Te = g(ussii)d 5 (3.47) 
h(x) + 0. (3.48) 


zt 


Gaussian Filters 49 


This model strictly generalizes the linear Gaussian model underlying Kalman filters, 
postulated in Equations (3.2) and (3.5). The function g replaces the matrices A, and B, 
in (3.2), and h replaces the matrix С, in (3.5). Unfortunately, with arbitrary functions g 
and h, the belief is no longer a Gaussian. In fact, performing the belief update exactly 
is usually impossible for nonlinear functions g and h, in the sense that the Bayes filter 
does not possess a closed-form solution. 


The extended Kalman filter (EKF) calculates an approximation to the true belief. It 
represents this approximation by a Gaussian. In particular, the belief bel(z;) at time 
t is represented by a mean u, and a covariance X+. Thus, the EKF inherits from the 
Kalman filter the basic belief representation, but it differs in that this belief is only 
approximate, not exact as was the case in Kalman filters. 


3.3.1 Linearization Via Taylor Expansion 


The key idea underlying the ЕКЕ is called linearization. Figure ?? illustrates the ba- 
sic concept. Suppose we are given a nonlinear next state function g. A Gaussian 
projected through this function is typically non-Gaussian. This is because nonlineari- 
ties in g distort the belief in ways that destroys its nice Gaussian shape, as illustrated in 
the figure. Linearization approximates g by a linear function that is tangent to g at the 
mean of the Gaussian. By projecting the Gaussian through this linear approximation, 
the posterior is Gaussian. In fact, once g is linearized, the mechanics of belief propa- 
gation are equivalent to those of the Kalman filter. The same argument applies to the 
multiplication of Gaussians when a measurement function h is involved. Again, the 
ЕКЕ approximates h by a linear function tangent to h, thereby retaining the Gaussian 
nature of the posterior belief. 


There exist many techniques for linearizing nonlinear functions. EKFs utilize a 
method called (first order) Taylor expansion. Taylor expansion construct a linear ap- 
proximation to a function g from g's value and slope. The slope is given by the partial 
derivative 


д (ute) = йит) (3.49) 
212—1 


Clearly, both the value of g and its slope depend on the argument of д. A logical 
choice for selecting the argument is to chose the state deemed most likely at the time 
of linearization. For Gaussians, the most likely state is the mean of the posterior j1;_1. 
In other words, g is approximated by its value at 144, (and at и), and the linear 
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extrapolation is achieved by а term proportional to the gradient of g at jjj. and uz: 


g(us, v4.1) x g(ui, ш-1) +r g (Ut, pai) (ig i = 4-1) 
Sey M 
=: Gi 
= 9(ш,ш-1) + Ge (xii — ш-1) (3.50) 


Written as Gaussian, the next state probability is approximated as follows: 


p(z: | Ut, 2—1) 
1 
~ det (27R,) 2 exp {—$ [xe — д(ш,ш-1) — Gr (ia — pii)l* 


Ry [ж — glu, pii) 7 Ge (а—1 — ш-1)]} (3.51) 


Notice that G; is a matrix of size n x n, with n denoting the dimension of the state. 
This matrix is often called the Jacobian. The value of the Jacobian depends оп 1; and 
Ht—1, hence it differs for different points in time. 


EKFs implement the exact same linearization for the measurement function h. Here 
the Taylor expansion is developed around j+, the state deemed most likely by the robot 
at the time it linearizes h: 


h(x) = hhe) + h'(u) (xi Be) 
—— 
=: Н; 
= №) + Н, (we — pe) (3.52) 
with h'(x) = аһа). Written as а Gaussian, we have 
= 1 Е 
plz |z) = det (279:) ? exp {—} [s — №) – Н, (ж, — д) 
Qrt [а — (д) — Hi (24 — №))} (3.53) 


3.3.2 The ЕКЕ Algorithm 


Table 3.3 states the EKF algorithm. In many ways, this algorithm is similar to the 
Kalman filter algorithm stated in Table 3.1. The most important differences are sum- 
marized by the following table: 
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Algorithm Extended Kalman filter(j,. .;, X4 1, Ut, zt): 
Pi = g(Ue, iii) 
х= Gi Ур GT +R 
К, = У; н (Н, У, HT + Qu. 
ш = № + Kin — h(u)) 
Sy = (I — Ki Hi) х, 
return ре, У 


Оу о ра 


Table 3.3 The extended Kalman б Iter (ЕКЕ) algorithm. 


| Kalman filter | ЕКЕ 
state prediction (Line 2) Ag ш-1 + By ut | glut, pui) 
measurement prediction (Line 5) C; ш Һр.) 


That is, the linear predictions in Kalman filters are replaced by their nonlinear gener- 
alizations in EKFs. Moreover, EKFs use Jacobians С; and Н, instead of the corre- 
sponding linear system matrices A+, B+, and С, in Kalman filters. The Jacobian С, 
corresponds to the matrices A, and B,, and the Jacobian Н, corresponds to С,. A 
detailed example for extended Kalman filters will be given in Chapter ??. 


3.3.3 Mathematical Derivation of the EKF 


The mathematical derivation of the EKF parallels that of the Kalman filter in Sec- 
tion 3.2.4, and hence shall only be sketched here. The prediction is calculated as 
follows (cf. (3.7)): 


bel(z,) = T p(z: | 2453, Ut) bel(z4 1) Чх+—1 
—— — 
ON (ж+}д(ик,ик—1)+С+(ж—1—и+—1),Н+) ~N (£t—1;Ht—-1,Xt—1) 
(3.54) 


This distribution is the EKF analog of the prediction distribution in the Kalman filter, 
stated in (3.7). The Gaussian p(x; | £+—1, uz) can be found in Equation (3.51). The 
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function L+ is given by (cf. (3.10)) 


Lp = i (2, — g(t, i1) — Gi(mi 1 — 1 3))7 
Rt (zi = glue, [E = Gi(ti-1 Ex ш-1)) 
Еа 3 (zi-1 — Шш D ae (£t—1 = i1) (3.55) 


which is quadratic in both z;..; and тү, as above. As in (3.11), we decompose L+ into 
Г.(21-1, Lt) and It (ж): 


Li(zi-i, 24) 
5 (a1 Ф, [СЕ Ку! (2, — g(ue, pii) + Giai) + Ж 1ш—1])1 97! 
(21-1 ©, [GP Ret (ж, — д(ш,ш-1) + Gita) Egli) (3.56) 


with 
& = (GPR, G+ E4)! (3.57) 
and hence 
Liz) = + (2, – glu, pii) + Guu-i)* Rz! (ж — glut, pai) + Саша) 
+ еа hia) XQ (вра Hri) 
– 5 [GF Ry? (2 — д(ш,ш—1) + Gui) + DE ea)” 
Ф, [СТ Ry* (ж – д(ш,ш-1) + Geter) + 3x ies] (3.58) 


As the reader easily verifies, setting the first derivative of L;(x;) to zero gives us the 
update и, = g(uz, е1), in analogy to the derivation in Equations (3.26) through 
(3.30). The second derivative is given by (А, + С, У; СТ)! (see (3.31). 


The measurement update is also derived analogously to the Kalman filter in Sec- 
tion 3.2.4. In analogy to (3.32), we have for the EKF 


bel(zi) = m p(z | 21) bel(x:) (3.59) 
—— 


—— 
~N (2¢;h(fit) +t (x-5), Qt) ~N (zoo) 
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using the linearized next state transition function from (3.52). This leads to the expo- 
nent (see (3.34)): 


zi Маъ) — He (ж — Bo))7 Qr (2 — а) — Н, (x2 — fio) 


Jt = 5 ( 
+5 (x; — д) Уу (ж — no) (3.60) 


The resulting mean and covariance is given by 


ш = quc Ki(zi = h(fit)) (3.61) 
X, = (I-K; Н) У, (3.62) 


with the Kalman gain 
К, = SA (H; $1 HE +Q)! (3.63) 


The derivation of these equations is analogous to Equations (3.35) through (3.46). 


3.3.4 Practical Considerations 


The EKF has become just about the most popular tool for state estimation in robotics. 
Its strength lies in its simplicity and in its computational efficiency. As was the case for 
the Kalman filter, each update requires time O(Kk?5 +n?), where k is the dimension of 
the measurement vector z+, and n is the dimension of the state vector x+. Other algo- 
rithms, such as the particle filter discussed further below, may require time exponential 
in n. 


The EKF owes its computational efficiency to the fact that it represents the belief by 
a multivariate Gaussian distribution. А Gaussian is a unimodal distribution, which 
can be thought of as a single guess, annotated with an uncertainty ellipse. In many 
practical problems, Gaussians are robust estimators. Applications of the Kalman filter 
to state spaces with 1,000 dimensions or more will be discussed in later chapters of 
this book. EKFs have been applied with great success to a number of state estimation 
problems that violate the underlying assumptions. 


Sometimes, one might want to pursue multiple distinct hypotheses. For example, a 
robot might have two distinct hypotheses as to where it is, but the arithmetic mean of 
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these hypotheses is not a likely contender. Such situations require multi-modal repre- 
sentations for the posterior belief. EKFs, in the form described here, are incapable of 
representing such multimodal beliefs. A common extension of EKFs is to represent 
posteriors using mixtures, or sums, of Gaussians. A mixture of J Gaussians may be of 
the form (cf. (22)): 


1 
bel(x) = 5 aj det (21354) 2 exp PCT — mna yet (ж, — Из) } 
2; 
(3.64) 
where a; are mixture parameters with a; > 0 and У j@j = 1. EKFs that utilize 


such mixture representations are called multi-hypothesis (extended) Kalman filters, of 
МНЕКЕ 


An important limitation of ће ЕКЕ arises from the fact that it approximates state 
transitions and measurements using linear Taylor expansions. In virtually all robotics 
problems, these functions are nonlinear. The goodness of this approximation depends 
on two main factors. First, it depends on the degree of nonlinearity of the functions that 
are being approximated. If these functions are approximately linear, the EKF approx- 
imation may generally be a good one, and EKFs may approximate the posterior belief 
with sufficient accuracy. However, sometimes, the functions are not only nonlinear, 
but are also multi-modal, in which case the linearization may be a poor approximation. 
The goodness of the linearization also depends on the degree of uncertainty. The less 
certain the robot, the wider its Gaussian belief, and the more it is affected by nonlin- 
earities in the state transition and measurement functions. In practice, when applying 
EKFs it is therefore important to keep the uncertainty of the state estimate small. 


We also note that Taylor series expansion is only one way to linearize. Two other ap- 
proaches have often been found to yield superior results. One is the unscented Kalman 
filter, which probes the function to be linearized at selected points and calculates a lin- 
earized approximation based on the outcomes of these probes. Another is known as 
moments matching, in which the linearization is calculated in a way that preserves the 
true mean and the true covariance of the posterior distribution (which is not the case 
for EKFs). Both techniques are relatively recent but appear to be superior to the EKF 
linearization. 
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3.4 THE INFORMATION FILTER 


The dual of the Kalman filter is the information filter. Just like the KF and its nonlin- 
ear version, the EKF, the information filter (IF) represents the belief by a Gaussian. 
Thus, the standard information filter is subject to the same assumptions underlying 
the Kalman filter. The key difference between the KF and the IF arises from the way 
the Gaussian belief is represented. Whereas in the Kalman filter family of algorithms, 
Gaussians are represented by their moments (mean, covariance), information filters 
represent Gaussians in their canonical representation, which is comprised of an in- 
formation matrix and an information vector. The difference in representation leads 
to different update equations. In particular, what is computationally complex in one 
representation happens to be simple in the other (and vice versa). The canonical and 
the moments representations are often considered dual to each other, and thus are the 
IF and the KF. 


3.4.1 Canonical Representation 


The canonical representation of a multivariate Gaussian is given by a matrix О and a 
vector é. The matrix О is the inverse of the covariance matrix: 


О = x, (3.65) 


Q is called the information matrix, or sometimes the precision matrix. The vector € is 
called the information vector. It is defined as 


= БК, (3.66) 


It is easy to see that Q and € are a complete parameterization of a Gaussian. In particu- 
lar, the mean and covariance of the Gaussian can easily be obtained from the canonical 
representation by the inverse of (3.65) and (3.66): 


У = Q! (3.67) 
и = Ore (3.68) 
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The canonical representation is often derived by multiplying out the exponent of a 
Gaussian. In (3.1), we defined the multivariate normal distribution as follows: 


p(r) = det (2л) 3 exp (- (zr — WTE (z — p) (3.69) 


A straightforward sequence of transformations leads to the following parameteriza- 
tion: 


BR 


р(х) = ае (27У) 2 exp(—-izTYX !z +E u- ШУ} 
1 
= det (27X)~2 exp([-iuTX tu} exp(—-izTX zx + 27D Gui) 
— SS 


const. 


The term labeled “const.” does not depend on the target variable x. Hence, it can be 
subsumed into the normalizer 7. 


р(х) = 7 exp {—}a" Do gta? yt m (3.71) 


This form motivates the parameterization of a Gaussian by its canonical parameters О 
and €. 


р(х) = mexp [7-127 Q z-4 zT E} (3.72) 


In many ways, the canonical representation is more elegant than the moments repre- 
sentation. In particular, the negative logarithm of the Gaussian (which plays an essen- 
tial role in information theory) is a quadratic function in the canonical parameters () 
and £: 


—logp(r) = const. + 217 Ох-х ё (3.73) 


Here “const.” is a constant. The reader may notice that we cannot use the symbol 7 
to denote this constant, since negative logarithms of probabilities do not normalize to 
1. The negative logarithm of our distribution p(x) is quadratic in z, with the quadratic 
term parameterized by О and the linear term by &. In fact, for Gaussians, €) must 
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Algorithm Information filter(£; .;,,; 4,0, z+): 
0, = (А OF, AT +R) 
& = QUA о &—1 + В, ut) 
О. = CT Qi! С. +0, 
& = 0T On: Z d 
return £r, 9; 


Qv шз гора 


Table3.4 The information fi Iter (IF) algorithm. 


be positive semidefinite, hence — log p(x) is a quadratic distance function with mean 
u = Q7! £. This is easily verified by setting the first derivative of (3.73) to zero: 


O[- log p(x)] 


25 0 = Ох—ё& = 0 — z-Q t (3.74) 


The matrix О determines the rate at which the distance function increases in the dif- 
ferent dimensions of the variable x. A quadratic distance that is weighted by a matrix 
Q is called Mahalanobis distance. 


3.4.2 The Information Filter Algorithm 


Table 3.4 states the update algorithm known as information filter. Its input is a Gaus- 
sian in its canonical representation £,..; and €),. 1, representing the belief at time t— 1. 
Just like all Bayes filters, its input includes the control и; and the measurement z+. The 
output are the parameters &; and О, of the updated Gaussian. 


The update involves matrices A+, Bt, Cy, Ri, and Qi. Those were defined in Sec- 
поп 3.2. The information filter assumes that the state transition and measurement 
probabilities are governed by the following linear Gaussian equations, originally de- 
fined in (3.2) and (3.5): 


te = Дь 1 + Brut + Et (3.75) 
Cyr, + б (3.76) 


@ 
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Here R, and Q; аге the covariances of the zero-mean noise variables =; and бү, respec- 
tively. 


Just like the Kalman filter, the information filter is updated in two steps, a prediction 
step and a measurement update step. The prediction step is implemented in Lines 2 
апа 3 in Table 3.4. The parameters & and О, describe the Gaussian belief over x; after 
incorporating the control u;, but before incorporating the measurement z;. The latter 
is done through Lines 4 and 5. Here the belief is updated based on the measurement 
Zt. 


These two update steps can be vastly different in complexity, especially if the state 
space possesses many dimensions. The prediction step, as stated in Table 3.4, involves 
the inversion of two matrices of the size n x n, where n is the dimension of the state 
space. This inversion requires approximately O(n®?8) time. In Kalman filters, the 
update step is additive and requires at most O(n?) time; it requires less time if only a 
subset of variables is affected by a control, or if variables transition independently of 
each other. These roles are reversed for the measurement update step. Measurement 
updates are additive in the information filter. They require at most O(n?) time, and 
they are even more efficient if measurements carry only information about a subset of 
all state variables at a time. The measurement update is the difficult step in Kalman 
filters. It requires matrix inversion whose worst case complexity is O(n??). This 
illustrates the dual character of Kalman and information filters. 


3.4.3 Mathematical Derivation of the 
Information Filter 


The derivation of the information filter is analogous to that of the Kalman filter. To 
derive the prediction step (Lines 2 and 3 in Table 3.4), we begin with the corresponding 
update equations of the Kalman filters, which can be found in Lines 2 and 3 of the 
algorithm in Table 3.1 and are restated here for the reader’s convenience: 


fe = Ai ш-1 + Bi uy (3.77) 
X = Aii AP +R, (3.78) 


The information filter prediction step follows now directly by substituting the mo- 
ments u and > by the canonical parameters £ and Q according to their definitions in 
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(3.67) and (3.72): 
Mt-1 = Q4 RET 
Suc m Өү (3.79) 


Substituting these expressions in (3.77) and (3.78) gives us the set of prediction equa- 
tions 


О, = AO AT +R)! (3.80) 
& = (A о &:-1 + Bi ш) (3.81) 


These equations are identical to those in Table 3.4. As is easily seen, the prediction 
step involves two nested inversions of a potentially large matrix. These nested inver- 
sions can be avoided when only a small number of state variables is affected by the 
motion update, a topic which will be discussed later in this book. 


The derivation of the measurement update is even simpler. We begin with the Gaussian 
of the belief at time t, which was provided in Equation (3.34) and is restated here once 
again: 


bel(z;) (3.82) 
= техр{—5 (ж — Cur)" Qr! (ж — Cin) — 5 (ae №) X (0 — fe) } 


For Gaussians represented in their canonical form this distribution is given by 


bel(x+) (3.83) 
= m exp (-3 Hr OF О! C, 2. xl OF Ox Z,— E xl 0,2; +a} &} 


which by reordering the terms in the exponent resolves to 
bel(r;) = m exp (-i rl ters Ort C; + £4] т, +21 [С ог 24 +&]} 


We can now read off the measurement update equations, by collecting the terms in the 
squared brackets: 


& = СОг até (3.84) 
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Algorithm Extended information filter(£, 1, €), 4,u;, z+): 


Q, = (Gi O7, GT + n)! 

& =O glut, 1-1) 

ДШ = ghur, ш-1) 

Q, = 0, + HT О! Н, 

& =& + HI QU [a - ha) — Н, ju] 


return £r, Qi 


O D on ce ou. a 


Table 3.5 The extended information fi lter (EIF) algorithm. 


о, = OF Q O+ (3.85) 


These equations are identical to the measurement update equations in Lines 4 and 5 of 
Table 3.4. 


3.4.4 The Extended Information Filter 
Algorithm 


The extended version of the information filter is analog to the EKF. Table 3.5 depicts 
the algorithm. The prediction is realized in Lines 2 through 4, and the measurement 
update in Lines 5 through 7. These update equations are largely analog to the linear 
information filter, with the functions g and h (and their Jacobian С; and Ну) replacing 
the parameters of the linear model A+, B;, and C;. As before, g and h specify the 
nonlinear next state function and measurement function, respectively. Those were 
defined in (3.47) and (3.48) and are restated here: 


® = ote, Be) d (3.86) 
h(z4) + ài . (3.87) 


zt 


Unfortunately, both g and h require a state as an input. This mandates the recovery of 
a state estimate и from the canonical parameters. The recovery takes place in Line 2, 
in which the state 14. 15 calculated from €),.., and €,_1 in the obvious way. Line 5 
computes the state jz; using the equation familiar from ће EKF (Line 2 in Table 3.3). 
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The necessity to recover the state estimate seems at odds with the desire to represent 
the filter using its canonical parameters. We will revisit this topic when discussing the 
use of extended information filters in the context of robotic mapping. 


3.4.5 Mathematical Derivation of the 
Extended Information Filter 


The extended information filter is easily derived by essentially performing the same 
linearization that led to the extended Kalman filter above. As in (3.50) and (3.52), 
EIFs approximate g and h by a Taylor expansion: 


glut, i51) © glut, pii) + Gi (20-1 — pui) (3.88) 


Q 


Неге С; and Н, are the Jacobians of g and h at ji... and H, respectively: 


б, = gl (ut, mes) (3.90) 
Н, = (uu) (3.91) 


These definitions are equivalent to those in the EKF. The prediction step is now derived 
from Lines 2 and 3 of the EKF algorithm (Table 3.3), which are restated here: 


3 = бяг В, (3.92) 
glue, I1) (3.93) 


= 
| 


Substituting >, by pb and р; by QF 1€, gives us the prediction equations of the 
extended information filter: 


О, = (Gi Qh GT + R)! (3.94) 
E ceo glun Qt, £1) (3.95) 


The measurement update is derived from Equations (3.59) and (3.60). In particular, 
(3.60) defines the following Gaussian posterior: 


bel(a,) = mexpi(-$ (% — №) — Hi (a — Ш) Q7! 
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(2 — h(t) — Н, (ж, — ii) 7 360 — fe) Уу (i — pa) 3.96) 


Multiplying out the exponent and reordering the terms gives us the following expres- 
sion for the posterior: 


bel(zx;)) = m exp {-4 zl HT Qr A, us Tal Hy Ог! [z — h(i) — Н, i] 
22:1 xt 24 + x mi 
= n ехр{—% [Ht Qr Het 3] ш 


+17 [HE Qj! [a — h(i) — Н me] + Уу! i] (3.97) 


With X7 1 = (), this expression resolves to the following information form: 


bel(z) = m exp(-ias? [H] Qj! Hit 4] x 
+ [H7 Qj [а — Аа) — Hi ja] + &] (3.98) 


We can now read off the measurement update equations by collecting the terms in the 
squared brackets: 


о, = O +H? Q H (3.99) 
& = &+ HT Qr! [az — М) — Н, ja] (3.100) 


3.4.6 Practical Considerations 


When applied to robotics problems, the information filter possesses several advantages 
over the Kalman filter. For example, representing global uncertainty is simple in the 
information filter: simply set €) — 0. When using moments, such global uncertainty 
amounts to a covariance of infinite magnitude. This is especially problematic when 
sensor measurements carry information about a strict subset of all state variables, a 
situation often encountered in robotics. Special provisions have to be made to handle 
such situations in EKFs. Furthermore, the information filter tends to be numerically 
more stable than the Kalman filter in many of the applications discussed later in this 
book. 


Another advantage of the information filter over the Kalman filter arises from its nat- 
ural fit for multi-robot problems. Multi-robot problems often involve the integration 
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of sensor data collected decentrally. Such integration is commonly performed through 
Bayes rule. When represented in logarithmic form, Bayes rule becomes an addition. 
As noted above, the canonical parameters of information filters represent a probability 
in logarithmic form. Thus, information integration is achieved by summing up infor- 
mation from multiple robots. Addition is commutative. Because of this, information 
filters can often integrate information in arbitrary order, with arbitrary delays, and in 
a completely decentralized manner. While the same is possible using the moments 
representation—after all, they represent the same information—the necessary over- 
head for doing so is much higher. Despite this advantage, the use of information filters 
in multi-robot systems remains largely under-explored. 


These advantages of the information filter are offset by important limitations. A pri- 
mary disadvantage of the EIF is the need to recover a state estimate in the update step, 
when applied to nonlinear systems. This step, if implemented as stated here, requires 
the inversion of the information matrix. Further matrix inversions are required for the 
prediction step of the information filters. In many robotics problems, the EKF does 
not involve the inversion of matrices of comparable size. For high dimensional state 
spaces, the information filter is generally believed to be computationally inferior to 
the Kalman filter. In fact, this is one of the reasons why the EKF has been vastly more 
popular than the EIF. 


As we will see later in this book, these limitations do not necessarily apply to problems 
in which the information matrix possess structure. In many robotics problems, the 
interaction of state variables is local; as a result, the information matrix may be sparse. 
Such sparseness does not translate to sparseness of the covariance. 


Information filters can be thought of as graphs, where states are connected whenever 
the corresponding off-diagonal element in the information matrix is non-zero. Sparse 
information matrices correspond to sparse graphs; in fact, such graphs are commonly 
known as Gaussian Markov random fields. A flurry of algorithms exist to perform 
the basic update and estimation equations efficiently for such fields, under names like 
“loopy belief propagation.” In this book, we will encounter a mapping problem in 
which the information matrix is (approximately) sparse, and develop an extended in- 
formation filter that is significantly more efficient than both Kalman filters and non- 
sparse information filters. 
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3.5 SUMMARY 


In this section, we introduced efficient Bayes filter algorithms that represent the pos- 
terior by multivariate Gaussians. We noted that 


m Gaussians can be represented in two different ways: The moments representation 
and the canonical representation. The moments representation consists of the 
mean (first moment) and the covariance (second moment) of the Gaussian. The 
canonical, or natural, representation consists of an information matrix and an 
information vector. Both representations are duals of each other, and each can be 
recovered from the other via matrix inversion. 


= Bayes filters can be implemented for both representations. When using the mo- 
ments representation, the resulting filter is called Kalman filter. The dual of the 
Kalman filter is the information filter, which represents the posterior in the canon- 
ical representation. Updating a Kalman filter based on a control is computation- 
ally simple, whereas incorporating a measurement is more difficult. The opposite 
is the case for the information filter, where incorporating a measurement is sim- 
ple, but updating the filter based on a control is difficult. 


m For both filters to calculate the correct posterior, three assumptions have to be 
fulfilled. First, the initial belief must be Gaussian. Second, the state transition 
probability must be composed of a function that is linear in its argument with 
added independent Gaussian noise. Third, the same applies to the measurement 
probability. It must also be linear in its argument, with added Gaussian noise. 
Systems that meet these assumptions are called linear Gaussian systems. 


m Both filters can be extended to nonlinear problems. The technique described in 
this chapter calculates a tangent to the nonlinear function. Tangents are linear, 
making the filters applicable. The technique for finding a tangent is called Taylor 
expansion. Performing a Taylor expansion involves calculating the first deriva- 
tive of the target function, and evaluating it at a specific point. The result of 
this operation is a matrix known as the Jacobian. The resulting filters are called 
"extended." 


m The accuracy of Taylor series expansions depends on two factors: The degree 
of nonlinearity in the system, and the width of the posterior. Extended filters 
tend to yield good results if the state of the system is known with relatively high 
accuracy, so that the remaining covariance is small. The larger the uncertainty, 
the higher the error introduced by the linearization. 


m  Oneofthe primary advantages of Gaussian filters is computational: The update 
requires time polynomial in the dimensionality of the state space. This is not 
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the case of some of the techniques described in the next chapter. The primary 
disadvantage is their confinement to unimodal Gaussian distributions. 


= Within the multivariate Gaussian regime, both filters, the Kalman filter and the 
information filter, have orthogonal strengths and weaknesses. However, the 
Kalman filter and its nonlinear extension, the extended Kalman filter, are vastly 
more popular than the information filter. 


The selection of the material in this chapter is based on today's most popular tech- 
niques in robotics. There exists a huge number of variations and extensions of the 
Gaussian filters presented here, that address the various limitations and shortcomings. 
One of the most apparent limitations of the material presented thus far is the fact that 
the posterior is represented by a single Gaussian. This confines these filters to situa- 
tions where the posterior can be described by a unimodal distribution. This is often 
appropriate in tracking applications, where a robot tracks a state variable with limited 
uncertainty. When uncertainty grows more global, a single mode can be insufficient, 
and Gaussians become too crude an approximation to the true posterior belief. This 
limitation has been well recognized, and even within the Gaussian paradigm exten- 
sions exist that can represent multimodal beliefs, e.g., using mixtures of Gaussians. 
Popular non-Gaussian approaches are described in the next chapter. 


3.6 BIBLIOGRAPHICAL REMARKS 
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CHAPTER 3 


NONPARAMETRIC FILTERS 


A popular alternative to Gaussian techniques are nonparametric filters. Nonparamet- 
ric filters do not rely on a fixed functional form of the posterior, such as Gaussians. 
Instead, they approximate posteriors by a finite number of values, each roughly cor- 
responding to a region in state space. Some nonparametric Bayes filters rely on a 
decomposition of the state space, in which each such value corresponds to the cumu- 
lative probability of the posterior density in a compact subregion of the state space. 
Others approximate the state space by random samples drawn from the posterior dis- 
tribution. In all cases, the number of parameters used to approximate the posterior 
can be varied. The quality of the approximation depends on the number of parameters 
used to represent the posterior. As the number of parameters goes to infinity, nonpara- 
metric techniques tend to converge uniformly to the correct posterior (under specific 
smoothness assumptions). 


This chapter discusses two nonparametric approaches for approximating posteriors 
over continuous spaces with finitely many values. The first decomposes the state space 
into finitely many regions, and represents the posterior by a histogram. A histogram 
assigns to each region a single cumulative probability; they are best thought of as 
piecewise constant approximations to a continuous density. The second technique 
represents posteriors by finitely many samples. The resulting filter is known as particle 
filter and has gained immense popularity in certain robotics problems. 


Both types of techniques, histograms and particle filters, do not make strong paramet- 
ric assumptions on the posterior density. In particular, they are well-suited to repre- 
sent complex multimodal beliefs. For this reason, they are often the method of choice 
when a robot has to cope with phases of global uncertainty, and when it faces hard data 
association problems that yield separate, distinct hypotheses. However, the represen- 
tational power of these techniques comes at the price of added computational com- 
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Algorithm Discrete Bayes filter((p; ; 1], Ut, 21): 
for all k do 
Prt = У, p(Xs = £p | Ut, Xii = zi) pa 
Prt = п p(z | Xt = хк) Pk 
endfor 
return {рк} 


QUA ee БӘ Ба 


Table 4.1 The discrete Bayes fi Iter. Here 2;, zg denote individual states. 


plexity. Fortunately, both nonparametric techniques described in this chapter make it 
possible to adapt the number of parameters to the (suspected) complexity of the pos- 
terior. When the posterior is of low complexity (e.g., focused on a single state with a 
small margin of uncertainty), they use only small numbers of parameters. For com- 
plex posteriors, e.g., posteriors with many modes scattered across the state space, the 
number of parameters grows larger. 


Techniques that can adapt the number of parameters to represent the posterior on- 
line are called adaptive. They are called resource-adaptive if they can adapt based 
on the computational resources available for belief computation. Resource-adaptive 
techniques play an important role in robotics. They enable robots to make decisions in 
real time, regardless of the computational resources available. Particle filters are often 
implemented as a resource-adaptive algorithm, by adapting the number of particles 
online based on the available computational resources. 


4.1 THE HISTOGRAM FILTER 


Histogram filters decompose the state space into finitely many regions, and represent 
the cumulative posterior for each region by a single probability value. When applied 
to discrete spaces, such filters are known as discrete Bayes filters. In continuous state 
spaces, they are known as histogram filters. We will first describe the discrete Bayes 
filter, and then discuss its use in continuous state spaces. 
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4.1.1 The Discrete Bayes Filter Algorithm 


Discrete Bayes filters apply to problems with finite state spaces, that is, where the 
random variable X; can take on finitely many values. We already encountered a dis- 
crete Bayes filter in Section 2.4.2, when discussing the example of a robot estimating 
the probability that a door is open. Some of the robotic mapping problems discussed 
in later chapters also involve discrete random variables. For example, occupancy grid 
mapping algorithms assume that each location in the environment is either occupied or 
free. The corresponding random variable is binary, that is, it can take on two different 
values. Thus, finite state spaces play an important role in robotics. 


Table 4.1 provides pseudo-code for the discrete Bayes filter. This code is derived from 
the general Bayes filter in Table 2.1 by replacing the integration with a finite sum. The 
variables x; and x, denote individual states, of which there may only be finitely many. 
The belief at time ¢ is an assignment of a probability to each state xg, denoted рь ;. 
Thus, the input to the algorithm is a discrete probability distribution {рь}, along with 
the most recent control и; and measurement z+. Line 3 calculates the prediction, the 
belief for the new state based on the control alone. This prediction is then updated in 
Line 4, so as to incorporate the measurement. The discrete Bayes filter algorithm is 
popular in many areas of signal processing (such as speech recognition), where it is 
often referred to as the forward pass of a hidden Markov models. 


4.1.2 Continuous State 


Of particular interest in this book will be the use of discrete Bayes filters as an ap- 
proximate inference tool for continuous state spaces. Such filters are called histogram 
filters. Histogram filters decompose a continuous state space into finitely many re- 
gions: 


range(X;) = xi4UXo4U...Xk4 (4.1) 


Here X, is the familiar random variable describing the state of the robot at time t. The 
function range( X+) denotes the state space, that is, the universe of possible values that 
X, might assume. Each хь ; describes a convex region. These regions together form 
a partitioning of the state space, that is, for each à # k we have X; t N Xk t = 0 and 
U; Xk, = range( X+). A straightforward decomposition of a continuous state space 
is a multi-dimensional grid, where each х is a grid cell. Through the granularity 
of the decomposition, we can trade off accuracy and computational efficiency. Fine- 
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grained decompositions infer smaller approximation errors than coarse ones, but at the 
expense of increased computational complexity. 


As we already discussed, the discrete Bayes filter assigns to each region хь; a proba- 
bility, pj, ;. Within each region, the discrete Bayes filter carries no further information 
on the belief distribution. Thus, the posterior becomes a piecewise constant PDF, 
which assigns a uniform probability to each state x; within each region x, ғ: 


pa) = “= 42) 


Here |x;, | is the volume of the region хь ү. 


If the state space is truly discrete, the conditional probabilities p(xx; | Ut, Xi, 1) 
and p(z: | хк) are well-defined, and the algorithm can be implemented as stated. In 
continuous state spaces, one is usually given the densities p(z; | ut, x; 1) and p(z | 
24), which are defined for individual states (and not for regions in state space). For 
cases where each region x; ; is small and of the same size, these densities are usually 
approximated by substituting x; by a representative of this region. For example, we 
might simply “probe” using the mean state in X; ; 


$i = [xk t|" | Tt da, (4.3) 
Хк, 


One then simply replaces 


plz: | Хк) x plz | Ext) (4.4) 
P(Xk,t | Ut, Xi t—1) x aa plêk, | Ut, $441) (4.5) 
[xxt] 


These approximations are the result of the piecewise uniform interpretation of the 
discrete Bayes filter stated in (4.2), and a linearization similar to the one used by 
EKFs. 


To see that (4.4) is a reasonable approximation, we note that р(2; | Xķ,t) can be 
expressed as the following integral: 


р(21, хр) 


2, х = 
p( t | k,t) p(xa.i) 
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Sana p(zi, zi) doi 


Ta p(x) da, 


LE p(z 


Lt) p(x) dz, 


i plz 


La } 1 dax, 


[xi elt | р(& | ve) da 
Xk,t 
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(4.6) 


This expression is an exact description of the desired probability under the piecewise 
uniform distribution model in (4.2). If we now approximate p(z; | x+) by p(z: | 14) 
for x; € хк, we obtain 


plz | Хк) 


|хк+ Б: | р( | fet) doi 
Хк, 


|хк+ 7! p(z | эы) | 1 da, 
Xk,t 


[Xt a p(z | S) [xx «| 


р(ж | 2x4) 


which is the approximation stated above in (4.4). 


(4.7) 


The derivation of the approximation to p(xx,; | Ut, xi4—1) in (4.5) is slightly more 
involved, since regions occur on both sides of the conditioning bar. In analogy to our 
transformation above, we obtain: 


P(Xk,t | Ut, Xi t—1) 


P(Xk t, Xi,t—1 | ш) 


p(xia-a | Ue) 


Send Jisa p(zi, zi—3 | ш) dae, d£t—1 


кн p(zi-i | ut) 2:1 
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E UM P(t | Ut, 221) P(Le-1 | Ue) dae, dzi-i 
Jes р(а—1 | ut) dzi-aà 


(4.8) 


We now exploit the Markov assumption, which implies independence between x+_ 
and wz, and thus р(х, | ui) = p(zi-i): 


P(Xk,t | Ut, Xi t—1) 
Jens docs p(zi Ut, X1— ) р(ж,—1) da, da, i 
(ир р(а{—1) dva 

[з e pla | ше, zii) р 


|xi.e-al 


Di,t-1 d 
dod Er Tii 


dz, das i 


e deas 1 € Ut, Lt— ) dai, d£t—1 
1dz,. 1 


dez 1 


[Xi t- m d. / P(T | Ut, mes 1) dai, d£t—1 (4.9) 
Xi, t—1 


If we now approximate p(z, | Ut, £t—1) by p(Zy, | Ut, 2:11) as before, we obtain 
the following approximation. Note that the normalizer 7) becomes necessary to ensure 
that the approximation is a valid probability distribution: 


P(Xk,t | Ut, Xi, t— 1) 


5 N |Xi t- |7 Ж i р(Фк | Ut, gi 1) dz, х1 
Xi,t—1 
= m xia | px, | usus) | / 1 2,4211 
Xk,t Xi,t—1 


7] xia | Plk | ut, $i 3i) xl [xil 


= 7] Xr plêk, | Ut, 2543) (4.10) 


If all regions are of equal size (meaning that [х ; | is the same for all k), we can simply 
omit the factor |X% |, since it is subsumed by the normalizer. The resulting discrete 
Bayes filter is then equivalent to the algorithm outlined in Table 4.1. If implemented 
as stated there, the auxiliary parameters pw do not constitute a probability distribution, 
since they are not normalized (compare Line 3 to (4.10)). However, normalization 
takes place in Line 4, so that the output parameters are indeed a valid probability 
distribution. 
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4.1.3 Decomposition Techniques 


In robotics, decomposition techniques of continuous state spaces come into two ba- 
sic flavors: static and dynamic. Static techniques rely on a fixed decomposition that 
is chosen in advance, irrespective of the shape of the posterior that is being approx- 
imated. Dynamic techniques adapt the decomposition to the specific shape of the 
posterior distribution. Static techniques are usually easier to implement, but they can 
be wasteful with regards to computational resources. 


A primary example of a dynamic decomposition technique is the family of density 
trees. Density trees decompose the state space recursively, in ways that adapt the 
resolution to the posterior probability mass. The intuition behind this decomposition 
is that the level of detail in the decomposition is a function of the posterior proba- 
bility: The less likely a region, the coarser the decomposition. Figure 4.1 illustrates 
the difference between a static grid representation and a density tree representation 
for a two-dimensional probability density. While both representations have the same 
approximation quality, the density tree is more compact than the grid representation. 
Dynamic techniques like density trees can often cut the computation complexity by 
orders of magnitude over static ones, yet they require additional implementation ef- 
fort. 


An effect similar to that of dynamic decompositions can be achieved by selective up- 
dating. When updating a posterior represented by a grid, selective techniques update 
a fraction of all grid cells only. A common implementation of this idea updates only 
those grid cells whose posterior probability exceeds a user-specified threshold. Selec- 
tive updating can be viewed as a hybrid decomposition, which decomposes the state 
space into a fine grained grid and one large set that contains all regions not chosen 
by the selective update procedure. In this light, it can be thought of as a dynamic 
decomposition technique, since the decision as to which grid cells to consider during 
the update is made online, based on the shape of the posterior distribution. Selective 
updating techniques can reduce the computational effort involved in updating beliefs 
by many orders of magnitude. They make it possible to use grid decompositions in 
spaces of three or more dimensions. 


The mobile robotics literature often distinguishes topological from metric representa- 
tions of space. While no clear definition of these terms exist, topological represen- 
tations are often thought of as course graph-like representations, where nodes in the 
graph correspond to significant places (or features) in the environment. For indoor 
environments, such places may correspond to intersections, T-junctions, dead ends, 
and so on. The resolution of such decompositions, thus, depends on the structure of 
the environment. Alternatively, one might decompose the state space using regularly- 
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(а) (b) 


Figure 4.1 (а) Grid based representation of a two-dimensional probability density. The 
probability density concentrates on the upper right and lower left part of the state space. (b) 
Density tree representation of the same distribution. 


spaced grids. Such a decomposition does not depend on the shape and location of the 
environmental features. Grid representations are often thought of as metric although, 
strictly speaking, it is the embedding space that is metric, not the decomposition. In 
mobile robotics, the spatial resolution of grid representations tends to be higher than 
that of topological representations. For instance, some of the examples in Chapter 7 
use grid decompositions with cell sizes of 10 centimeters or less. This increased accu- 
racy comes at the expense of increased computational costs. 


4.1.4 Binary Bayes Filters With Static State 


Certain problems in robotics are best formulated as estimation problems with binary 
state that does not change over time. Problems of this type arise if a robot estimates 
a fixed binary quantity in the environment from a sequence of sensor measurements. 
Since the state 1s static, the belief is a function of the measurements: 


bel,(a:) = p(x | Zit Ur:t) = p(x | 214) (4.11) 


where the state is chosen from two possible values, denoted by x and ^x. In particular, 
we have bel;(^x) = 1 — bel; (x). The lack of a time index for the state x reflects the 
fact that the state does not change. 


Naturally, binary estimation problems of this type can be tackled using the discrete 
Bayes filter in Table 4.1. However, the belief is commonly implemented as a log odds 
ratio. The odds of a state x is defined as the ratio of the probability of this event 
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1: Algorithm binary Bayes filter(l, ;, z+): 
Ж (222) (x) 
l = l-1 + log 12а) — 108 pG) 
3: return l 


Table 4.2 The binary Bayes fi Iter in log odds form with an inverse measurement model. 
Here l+ is the log odds of the posterior belief over a binary state variable that does not 
change over time. 


divided by the probability of its negate 


P(t) _ pz) (4.12) 
p(x) 1 — p(x) 
The log odds is the logarithm of this expression 
I(x) := log BC M (4.13) 
1 — p(z) 


Log odds assume values from —oo to oo. The Bayes filter for updating beliefs in 
log odds representation is computationally elegant. It avoids truncation problems that 
arise for probabilities close to 0 or 1. 


Table 4.2 provides the basic update algorithm. This algorithm is additive; in fact, any 
algorithm that increment and decrement a variable in response to measurements can be 
interpreted as a Bayes filter in log odds form. This binary Bayes filter uses an inverse 
measurement model p(x | 21), instead of the familiar forward model p(z; | х). The 
inverse measurement model specifies a distribution over the (binary) state variable as 
a function of the measurement z;. Inverse models are often used in situations where 
measurements are more complex than the binary state. An example of such a situation 
is the problem of estimating whether or not a door is closed, from camera images. 
Here the state is extremely simple, but the space of all measurements is huge. It is 
easier to devise a function that calculates a probability of a door being closed from 
a camera image, than describing the distribution over all camera images that show a 
closed door. In other words, it is easier to implement an inverse than a forward sensor 
model. 
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As the reader easily verifies from our definition of the log odds (4.13), the belief 
bel, (2) can be recovered from the log odds ratio l; by the following equation: 


1 


о e MESE 


(4.14) 


To verify the correctness of our binary Bayes filter algorithm, we briefly restate the 
basic filter equation with the Bayes normalizer made explicit: 


plz | v, Z14—1) p(x | Z1:t-1) 
plz | 214—1) 
р( | x) p(x | 214—1) 


P(t | at) = 


= 4.15 
р(& | 214-1) | ) 
We now apply Bayes rule to ће measurement model p(z; | 2): 


р(х) 
and obtain 


_ pla | zo) p(z) pla | 214-1) 
НОЕ p(x) p(t | 212-1) | (4-17) 


By analogy, we have for the opposite event ^x: 


p(^x | zt) plz) р(х | 214-1) 
ее 41 
p(ow | za) SE CU Eid Men 


Dividing (4.17) by (4.18) leads to cancellation of various difficult-to-calculate proba- 
bilities: 


plz |z) p(e| zt) ple | ati) pOr) 


а POEL) Pew) WD 
2l RONEN ова). des) 
БЕРЕ ШО a 
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We denote the log odds ratio of the belief bel; (2) by 1, (2). The log odds belief at time 
t is given by the logarithm of (4.19). 


ums p(x | 20) B p(z | 214—1) Ы 1—р(@) 
(ж) = 1 eT HGS) + 1 Sar т] +1 ET 
— ор Pel) p(z) " 
= кол crc Е (4.20) 


Here p(x) is the prior probability of the state x. As (4.20), each measurement update 
involves the addition of the prior (in log odds form). The prior also defines the log 
odds of the initial belief before processing any sensor measurement: 


1— p(x) 


4.21 
p(x) vel 


lo(z) = log 


4.2 THE PARTICLE FILTER 


4.2.1 Basic Algorithm 


The particle filter is an alternative nonparametric implementation of the Bayes filter. 
Just like histogram filters, particle filters approximate the posterior by a finite number 
of parameters. However, they differ in the way these parameters are generated, and in 
which they populate the state space. The key idea of the particle filter is to represent 
the posterior bel(z;) by a set of random state samples drawn from this posterior. Fig- 
ure ?? illustrates this idea for a Gaussian. Instead of representing the distribution by 
a parametric form (the exponential function that defines the density of a normal dis- 
tribution), particle filters represent a distribution by a set of samples drawn from this 
distribution. Such a representation is approximate, but it is nonparametric, and there- 
fore can represent a much broader space of distributions than, for example, Gaussians. 


In particle filters, the samples of a posterior distribution are called particles and are 
denoted 


A, = call! (4.22) 


Each particle кї"! (with 1 < m < М) is a concrete instantiation of the state at time t, 


that is, a hypothesis as to what the true world state may be at time t. Here M denotes 
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1 Algorithm Particle filter(AX, ., uz, 21): 
2 А, = А, = 0 

3 for m — 1 to M do 

4 sample al") ~ plz: | uj, xt ) 
5: wt"! = p(ze | af") 

6: Xi = X + (xl) wll) 

7 endfor 

8 for m — 1 to M do 

9: draw i with probability oc w!” 
10: add z” to 2, 

11: endfor 

12: return 2, 


Table 4.3 The particle fi Iter algorithm, a variant of the Bayes fi ter based on importance 
sampling. 


the number of particles in the particle set V;. In practice, the number of particles M 
is often a large number, e.g., M — 1,000. In some implementations M is a function 
of t or of other quantities related to the belief bel(;). 


The intuition behind particle filters is to approximate the belief bel(x+) by the set of 
particles А;. Ideally, the likelihood for a state hypothesis x; to be included in the 
particle set 2, shall be proportional to its Bayes filter posterior bel (x+): 


[m] 


тү p(X, | Zit, Urt) (4.23) 


As a consequence of (4.23), the denser a subregion of the state space is populated by 
samples, the more likely it is that the true state falls into this region. As we will discuss 
below, the property (4.23) holds only asymptotically for M 1 oo for the standard 
particle filter algorithm. For finite M, particles are drawn from a slightly different 
distribution. In practice, this difference is negligible as long as the number of particles 
is not too small (e.g., M > 100). 


Just like all other Bayes filter algorithms discussed thus far, the particle filter algo- 
rithm constructs the belief bel (x+) recursively from the belief bel(x,_1) one time step 
earlier. Since beliefs are represented by sets of particles, this means that particle filters 
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construct the particle set X, recursively from the set 41. The most basic variant 
of the particle filter algorithm is stated in Table 4.3. The input of this algorithm is 
the particle set X;—1, along with the most recent control и; and the most recent mea- 
surement z+. The algorithm then first constructs a temporary particle set Æ which is 
reminiscent (but not equivalent) to the belief be/(z;). It does this by systematically 


processing each particle al in the input particle set А%—1 as follows. 


1. Line 4 generates a hypothetical state 21" for time t based on the particle Eu 


and the control u;. The resulting sample is indexed by m, indicating that it is 
generated from the m-th particle in 4%_1. This step involves sampling from the 
next state distribution р(х, | ut, 2.1). To implement this step, one needs to 
be able to sample from p(z, | ut, 2:1). The ability to sample from the state 
transition probability is not given for arbitrary distributions p(z, | Ut, £e—1). 
However, many major distributions in this book possess efficient algorithms for 
generating samples. The set of particles resulting from iterating Step 4 M times 
is the filter’s representation of bel (x+). 


2. Line 5 calculates for each particle 21" the so-called importance factor, denoted 


wl), Importance factors are used to incorporate the measurement z; into the 
particle set. The importance, thus, is the probability of the measurement 2, under 
the particle al, that is, wl” = p(z | oll), If we interpret ш!" as the weight 
of a particle, the set of weighted particles represents (in approximation) the Bayes 


filter posterior bel(z,). 


3. The real “trick” of the particle filter algorithm occurs in Lines 8 through 11 in Ta- 
ble 4.3. These lines implemented what is known as resampling or importance re- 
sampling. The algorithm draws with replacement M particles from the temporary 
set X. The probability of drawing each particle is given by its importance weight. 
Resampling transforms a particle set of M particles into another particle set of the 
same size. By incorporating the importance weights into the resampling process, 
the distribution of the particles change: whereas before the resampling step, they 
were distribution according to bel(x;), after the resampling they are distributed 
(approximately) according to the posterior bel(z;) = n p(z | al”"!)bel(a,). In 
fact, the resulting sample set usually possesses many duplicates, since particles 
are drawn with replacement. More important are the particles that are not con- 
tained in 2%: those tend to be the particles with lower importance weights. 


The resampling step has the important function to force particles back to the posterior 
Бе1(2:,). In fact, an alternative (and usually inferior) version of the particle filter would 
never resample, but instead would maintain for each particle an importance weight 
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that is initialized by 1 and updated multiplicatively: 


[m] 


wl" = pla | al”) wl) (4.24) 


Such a particle filter algorithm would still approximate the posterior, but many of its 
particles would end up in regions of low posterior probability. As a result, it would 
require many more particles; how many depends on the shape of the posterior. The 
resampling step is a probabilistic implementation of the Darwinian idea of survival 
of the fittest: It refocuses the particle set to regions in state space with high posterior 
probability. By doing so, it focuses the computational resources of the filter algorithm 
to regions in the state space where they matter the most. 


4.2.2 Importance Sampling 


For the derivation of the particle filter, it shall prove useful to discuss the resampling 
step in more detail. Figure 4.2 illustrates the intuition behind the resampling step. 
Figure 4.2a shows a density function f of a probability distribution called the target 
distribution. What we would like to achieve is to obtain a sample from f. However, 
sampling from f directly may not be possible. Instead, we can generate particles from 
a related density, labeled g in Figure 4.2b. The distribution that corresponds to the 
density g is called proposal distribution. The density g must be such that f(x) > 0 
implies g(x) > 0, so that there is a non-zero probability to generate a particle when 
sampling from g for any state that might be generated by sampling from f. However, 
the resulting particle set, shown at the bottom of Figure 4.2b, is distributed according 
to g, not to f. In particular, for any interval А C range(X) (or more generally, any 
Borel set A) the empirical count of particles that fall into А converges to the integral 
of g under A: 


га У rae — f g(a) dz (4.25) 
m=1 


To offset this difference between f and g, particles x!" are weighted by the quotient 


(4.26) 
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2 4 6 8 10 12 


2 4 6 8 10 12 


Figure4.2 Illustration of importance factors in particle fi Кегѕ: (а) We seek to approximate 
the target density f. (b) Instead of sampling from f directly, we can only generate samples 
from a different density, g. Samples drawn from g are shown at the bottom of this diagram. 
(с) A sample of f is obtained by attaching the weight f(x)/g(x) to each sample x. In 
particle filters, f corresponds to the belief bel (2%) and g to the belief bel (x+). 
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This is illustrated by Figure 4.2c: The vertical bars in this figure indicate the magnitude 
of the importance weights. Importance weights are the non-normalized probability 
mass of each particle. In particular, we have 


m=i m=1 


M -1 м 
ps н) У ("їе А) ш”! — ]. f(a) dx (4.27) 
A 
where the first term serves as the normalizer for all importance weights. In other 
words, even though we generated the particles from the density g, the appropriately 
weighted particles converge to the density f. 


The specific convergence involves an integration over a set А. Clearly, a particle set 
represents a discrete distribution, whereas f is continuous in our example. Because 
of this, there is no density that could be associated with a set of particles. The con- 
vergence, thus, is over the cumulative distribution function of f, not the density itself 
(hence the integration over A). A nice property of importance sampling is that it con- 
verges to the true density if g(z) > 0 whenever f(x) 0. In most cases, the rate of 
convergence is in Oz)» where M is the number of samples. The constant factor 


depends on the similarity of f(s) and g(s). 
In particle filters, the density f corresponds to the target belief bel(x,). Under the 


(asymptotically correct) assumption that the particles in 2, _ у are distributed according 
to bel(x+—1), the density g corresponds to the product distribution: 


p(zs | Ut, 241) bel(z 1) (4.28) 


This distribution is called the proposal distribution. 


4.2.3 Mathematical Derivation of the PF 


To derive particle filters mathematically, it shall prove useful to think of particles as 
samples of state sequences 


[m] _ [т] [mm] [т] (4.29) 


Жо: = Жу Ti ,...,4{ 


[m] 


It is easy to modify the algorithm accordingly: Simply append to the particle x; 
the sequence of state samples from which it was generated ab. This particle filter 
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calculates the posterior over all state sequences: 
bel(xo4) =  p(xou | Urt, zia) (4.30) 


instead of the belief bel(z4) = р(х, | u14, 21:4). Admittedly, the space over all state 
sequences is huge, and covering it with particles is usually plainly infeasible. How- 
ever, this shall not deter us here, as this definition serves only as the means to derive 
the particle filter algorithm in Table 4.2. 


The posterior bel(xo.;) is obtained analogously to the derivation of bel(z,) in Sec- 
tion 2.4.3. In particular, we have 


p(zo: | АЕТ u14) 


"ET np(a | zoe 2101,01) P(rox | zis-i unc) 
Merkov (ze | te) p(zos | zut-i, un) 

= тр(2 | 21) p(t | Vot-1, 210-1, 014) Р(®ох—1 | zi4—1, 214) 
MEY p p(z | ate) plate | erst) P(tos-a | ач,ал) eu 


Notice the absence of integral signs in this derivation, which is the result of maintain- 
ing all states in the posterior, not just the most recent one as in Section 2.4.3. 


The derivation is now carried out by induction. The initial condition is trivial to verify, 
assuming that our first particle set is obtained by sampling the prior p(xo). Let us 
assume that the particle set at time t— 1 is distributed according to bel(xo-4-1). For the 
m-th particle ol i in this set, the sample al” generated in Step 4 of our algorithm 
is generated from the proposal distribution: 


p(zi | Dy 1s Ut) bel(xo:t-1) 


= pmi 2:90) p»(294-1 | 204-4, 003—1) (4.32) 
With 


[m] target distribution 
wi = 


proposal distribution 
n p(z | Tt) p(zi | Tt—1, Ut) p(£o:t—1 | 214—1, Ul:t—1) 
p(zi | L4-1, Ut) P(®o:t-1 | 29:t—1; 9:t—1) 


T n p(z | T) (4.33) 
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The constant 7) plays no role since the resampling takes place with probabilities pro- 
portional to the importance weights. By resampling particles with probability propor- 
tional to ш, the resulting particles are indeed distributed according to the product 


of the proposal and the importance weights wll: 


[m] 


пи, р(х. | zi a, Ut) p(xou—i | Zou—1,uo4-1) = 55е (20:0) (4.34) 


(Notice that the constant factor 7 here differs from the one in (4.33).) The algorithm in 


Table 4.2 follows now from the simple observation that if xml is distributed according 


[m] 


to bel(xo:z), then the state sample ж, is (trivially) distributed according to bel(x+). 


As we will argue below, this derivation is only correct for M — оо, due to a laxness 
in our consideration of the normalization constants. However, even for finite M it 
explains the intuition behind the particle filter. 


4.2.4 Properties of the Particle Filter 


Particle filters are approximate and as such subject to approximation errors. There 
are four complimentary sources of approximation error, each of which gives rise to 
improved versions of the particle filter. 


1. The first approximation error relates to the fact that only finitely many particles 
are used. This artifact introduces a systematic bias in the posterior estimate. 
To see, consider the extreme case of M — 1 particle. In this case, the loop in 
Lines 3 through 7 in Table 4.3 will only be executed once, and А, will contain 
only a single particle, sampled from the motion model. The key insight is that 
the resampling step (Lines 8 through 11 in Table 4.3) will now deterministically 
accept this sample, regardless of its importance factor wl), Put differently, the 
measurement probability p(z; | тЇ"! plays no role in the result of the update, 
and neither does z;. Thus, if M — 1, the particle filter generates particles from 
the probability 


p(z« | ui) (4.35) 


instead of the desired posterior p(x; | u1:+, 214). It flatly ignores all measure- 
ments. How can this happen? 


The culprit is the normalization, implicit in the resampling step. When sampling 
in proportion to the importance weights (Line 9 of the algorithm), ші" becomes 
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its own normalizer if M — 1: 


NC 
p(draw 21"! in Line9) = mpl (4.36) 
Wt 


In general, the problem is that the non-normalized values w;[m] are drawn from 
an M -dimensional space, but after normalization they reside in a space of dimen- 
sion M —1. This is because after normalization, the m-th weight can be recovered 
from the M — 1 other weights by subtracting those from 1. Fortunately, for larger 
values of M, the effect of loss of dimensionality, or degrees of freedom, becomes 
less and less pronounced. 


2. A second source of error in the particle filter relates to the randomness intro- 
duced in the resampling phase. To understand this error, it will once again be 
useful to consider the extreme case, which is that of a robot whose state does not 
change. Sometimes, we know for a fact that x; = x,..1. А good example is that 
of mobile robot localization, for a non-moving robot. Let us furthermore assume 
that the robot possesses no sensors, hence it cannot estimate the state, and that 
it is unaware of the state. Initially, our particle set Xo will be generated from 
the prior; hence particles will be spread throughout the state space. The random 
nature of the resampling step (Line 8 in the algorithm) will regularly fail to draw 
a state sample gll, However, since our state transition is deterministic, no new 
states will be introduced in the forward sampling step (Line 4). The result is quite 
daunting: With probability one, M identical copies of a single state will survive; 
the diversity will disappear due to the repetitive resampling. To an outside ob- 
server, it may appear that the robot has uniquely determined the world state—an 
apparent contradiction to the fact that the robot possesses no sensors. 


This example hints at an important limitation of particle filters with immense 
practical ramifications. In particular, the resampling process induces a loss of di- 
versity in the particle population, which in fact manifests itself as approximation 
error. Such error is called variance of the estimator: Even though the variance of 
the particle set itself decreases, the variance of the particle set as an estimator of 
the true belief increases. Controlling this variance, or error, of the particle filter 
is essential for any practical implementation. 


There exist two major strategies for variance reduction. First, one may reduce the 
frequency at which resampling takes place. When the state is known to be static 
(2; = 24,1) one should never resample. This is the case, for example, in mobile 
robot localization: When the robot stops, resampling should be suspended (and 
in fact it is usually a good idea to suspend the integration of measurements as 
well). Even if the state changes, it is often a good idea to reduce the frequency of 
resampling. Multiple measurements can always be integrated via multiplicatively 
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Algorithm Low variance sampler(4;, УМ): 


А, = 0 

т = rand(0; M^!) 
с = wl! 

1= 1 


for m = 1 to M do 
u-r-ct-(m-1).M^! 
while u >с 
i—i-cl 
c— c-r w” 
endwhile 
add z! to 2, 
endfor 
return 2, 


Table 4.4 Low variance resampling for the particle filter. This routine uses a single ran- 
dom number to sample from the particle set А? with associated weights W, yet the probabil- 
ity of a particle to be resampled is still proportional to its weight. Furthermore, the sampler 
is effi cient: Sampling M particles requires O( M) time. 


updating the importance factor as noted above. More specifically, it maintains the 
importance weight in memory and updates them as follows: 


m m] . : 4.37 
s р( | al h w if no resampling took place cen 


[m _ { 1 if resampling took place 

The choice of when to resample is intricate and requires practical experience: 
Resampling too often increases the risk of losing diversity. If one samples too 
infrequently, many samples might be wasted in regions of low probability. A 
standard approach to determining whether or not resampling should be performed 
is to measure the variance of the importance weights. The variance of the weights 
relates to the efficiency of the sample based representation. If all weights are 
identical, then the variance is zero and no resampling should be performed. If, on 
the other hand, the weights are concentrated on a small number of samples, then 
the weight variance is high and resampling should be performed. 


The second strategy for reducing the sampling error is known as low variance 
sampling. Table 4.4 depicts an implementation of a low variance sampler. The 
basic idea is that instead of selecting samples independently of each other in the 
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r г+М-! r+2M~!... 


Figure 4.3 Principle of the low variance resampling procedure. We choose a random 
number r and then select those particles that correspond to u = r + (m — 1)- M71 where 
m= 1,..., M. 


resampling process (as is the case for the basic particle filter in Table 4.3), the 
selection involves a sequential stochastic process. 


Instead of choosing M random numbers and selecting those particles that corre- 
spond to these random numbers, this algorithm computes a single random number 
and selects samples according to this number but still with a probability propor- 
tional to the sample weight. This is achieved by drawing a random number r in 
the interval [0; M ^! , where M is the number of samples to be drawn at time t. 
The algorithm in Table 4.4 then selects particles by repeatedly adding the fixed 
amount M ^! to r and by choosing the particle that corresponds to the resulting 
number. Any number u in [0; 1] points to exactly one particle, namely the particle 
i for which 


j 
i = argmin 5 ш" >u (4.38) 
j m=1 
The while loop in Table 4.4 serves two tasks, it computes the sum in the right- 
hand side of this equation and additionally checks whether ? is the index of the 
first particle such that the corresponding sum of weights exceeds u. The selection 
is then carried out in Line 12. This process is also illustrated in Figure 4.3. 


The advantage of the low-variance sampler is threefold. First, it covers the space 
of samples in a more systematic fashion than the independent random sampler. 
This should be obvious from the fact that the dependent sampler cycles through 
all particles systematically, rather than choosing them independently at random. 
Second, if all the samples have the same importance factors, the resulting sam- 
ple set Д, is equivalent to 2% so that no samples are lost if we resample without 
having integrated an observation into 1. Third, the low-variance sampler has 
a complexity of O(M). Achieving the same complexity for independent sam- 
pling is difficult; obvious implementations require a O(log M) search for each 
particle once a random number has been drawn, which results in a complexity of 
O(M log M) for the entire resampling process. Computation time is of essence 
when using particle filters, and often an efficient implementation of the resam- 
pling process can make a huge difference in the practical performance. For these 
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reasons, most implementations of particle filters in robotics tend to rely on mech- 
anisms like the one just discussed. 


In general, the literature on efficient sampling is huge. Another popular option 
is stratified sampling, in which particles are grouped into subsets. The number 
of samples in each subset can be kept the same over time, regardless of the total 
weight of the particles contained in each subset. Such techniques tend to perform 
well when a robot tracks multiple, distinct hypotheses with a single particle filter. 


. A third source of error pertains to the divergence of the proposal and target dis- 


tribution. We already hinted at the problem above, when discussing importance 
sampling. In essence, particles are generated from a proposal distribution that 
does not consider the measurement (cf., Equation (4.28)). The target distribution, 
which is the familiar Bayes filter posterior, depends of course on the measure- 
ment. The efficiency of the particle filter relies crucially on the ' match' between 
the proposal and the target distribution. If, at one extreme, the sensors of the 
robot are highly inaccurate but its motion is very accurate, the target distribution 
will be similar to the proposal distribution and the particle filter will be efficient. 
If, on the other hand, the sensors are highly accurate but the motion is not, these 
distributions can deviate substantially and the resulting particle filter can become 
arbitrarily inefficient. An extreme example of this would be a robot with deter- 
ministic sensors. For most deterministic sensors, the support of the measurement 
probability p(z | x) will be limited to a submanifold of the state space. For ex- 
ample, consider a mobile robot that performs localization with noise-free range 
sensors. Clearly, p(z | x) will be zero for almost every state x, with the ex- 
ceptions of those that match the range measurement z exactly. Such a situation 
can be fatal: the proposal distribution will practically never generate a sample 
x which exactly corresponds to the range measurement 2. Thus, all importance 
weights will be zero with probability one, and the resampling step becomes ill- 
conditioned. More generally, if p(z | x) is degenerate, meaning that its support 
is restricted to a manifold of a smaller dimension than the dimension of the state 
space, the plain particle filter algorithm is inapplicable. 


There exist a range of techniques for overcoming this problem. One simple- 
minded technique is to simply assume more noise in perception than there actu- 
ally is. For example, one might use a measurement model p(z | x) that overes- 
timates the actual noise in the range measurements. In many implementations, 
such a step improves the accuracy of the particle filter—despite the oddity of 
using a knowingly incorrect measurement probability. Other techniques involve 
modifications of the proposal distribution in ways that incorporate the measure- 
ment. Such techniques will be discussed in later chapters of this book. 


. A fourth and final disadvantage of the particle filter is known as the particle de- 


privation problem. When performing estimation in a high-dimensional space, 
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there may be no particles in the vicinity to the correct state. This might be be- 
cause the number of particles is too small to cover all relevant regions with high 
likelihood. However, one might argue that this ultimately must happen in any 
particle filter, regardless of the particle set size M. Particle deprivation occurs as 
the result of random resampling; an unlucky series of random numbers can wipe 
out all particles near the true state. At each resampling step, the probability for 
this to happen is larger than zero (although it is usually exponentially small in 
M). Thus, we only have to run the particle filter long enough. Eventually, we 
will generate an estimate that is arbitrarily incorrect. 


In practice, problems of this nature only tend to arise when M is small relative 
to the space of all states with high likelihood. A popular solution to this prob- 
lem is to add a small number of randomly generated particles into the set after 
each resampling process, regardless of the actual sequence of motion and mea- 
surement commands. Such a methodology can reduce (but not fix) the particle 
deprivation problem, but at the expense of an incorrect posterior estimate. The 
advantage of adding random samples lies in its simplicity: The software mod- 
ification necessary to add random samples in a particle filter is minimal. As a 
rule of thumb, adding random samples should be considered a measure of last re- 
sort, which should only be applied if all other techniques for fixing a deprivation 
problem have failed. 


This discussion showed that the quality of the sample based representation increases 
with the number of samples. An important question is therefore how many samples 
should be used for a specific estimation problem. Unfortunately, there is no perfect 
answer to this question and it is often left to the user to determine the required number 
of samples. As a rule of thumb, the number of samples strongly depends on the di- 
mensionality of the state space and the uncertainty of the distributions approximated 
by the particle filter. For example, uniform distributions require many more samples 
than distributions focused on a small region of the state space. А more detailed dis- 
cussion on sample sizes will be given in the context of robot localization, when we 
consider adaptive particle filters (see Section ??). 


4.3 SUMMARY 


This section introduced two nonparametric Bayes filters, histogram filters and particle 
filters. Nonparametric filters approximate the posterior by a finite number of values. 
Under mild assumptions on the system model and the shape of the posterior, both 
have the property that the approximation error converges uniformly to zero as the the 
number of values used to represent the posterior goes to infinity. 
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The histogram filter decomposes the state space into finitely many convex re- 
gions. Itrepresents the cumulative posterior probability of each region by a single 
numerical value. 


There exist many decomposition techniques in robotics. In particular, the granu- 
larity of a decomposition may or may not depend on the structure of the environ- 
ment. When it does, the resulting algorithms are often called "topological." 


Decomposition techniques can be divided into static and dynamic. Static decom- 
positions are made in advance, irrespective of the shape of the belief. Dynamic 
decompositions rely on specifics of the robot's belief when decomposing the state 
space, often attempting to increase spatial resolution in proportion to the poste- 
rior probability. Dynamic decompositions tend to give better results, but they are 
also more difficult to implement. 


An alternative nonparametric technique is known as particle filter. Particle filters 
represent posteriors by a random sample of states, drawn from the posterior. Such 
samples are called particles. Particle filter are extremely easy to implement, and 
they are the most versatile of all Bayes filter algorithms represented in this book. 


Specific strategies exist to reduce the error in particle filters. Among the most 
popular ones are techniques for reducing the variance of the estimate that arises 
from the randomness of the algorithm, and techniques for adapting the number 
of particles in accordance with the complexity of the posterior. 


The filter algorithms discussed in this and the previous chapter lay the groundwork 
for most probabilistic robotics algorithms discussed throughout the remainder of this 
book. The material presented here represents many of today's most popular algorithms 
and representations in probabilistic robotics. 


4.4 BIBLIOGRAPHICAL REMARKS 


ROBOT MOTION 


5.1 INTRODUCTION 


This and the next chapter describe the two remaining components for implementing 
the filter algorithms described thus far: the motion and the measurement models. This 
chapter focuses on the motion model. It provides in-depth examples of probabilistic 
motion models as they are being used in actual robotics implementations. These mod- 
els comprise the state transition probability p(x; | u;, 2,1), which plays an essential 
role in the prediction step of the Bayes filter. The subsequent chapter will describe 
probabilistic models of sensor measurements p(z; | x+), which are essential for the 
measurement update step. The material presented here will find its application in all 
chapters that follow. 


Robot kinematics, which is the central topic of this chapter, has been studied thor- 
oughly in past decades. However, is has almost exclusively been addressed in deter- 
ministic form. Probabilistic robotics generalizes kinematic equations to the fact that 
the outcome of a control is uncertain, due to control noise or unmodeled exogenous 
effects. Following the theme of this book, our description will be probabilistic: The 
outcome of a control will be described by a posterior probability. In doing so, the 
resulting models will be amenable to the probabilistic state estimation techniques de- 
scribed in the previous chapters. 


Our exposition focuses entirely on mobile robot kinematics for robots operating in 
planar environments. In this way, it is much more specific than most contemporary 
treatments of kinematics. No model of manipulator kinematics will be provided, nei- 
ther will we discuss models of robot dynamics. However, this restricted choice of 
material is by no means to be interpreted that probabilistic ideas are limited to kine- 
matic models of mobile robots. Rather, it is descriptive of the present state of the art, as 
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probabilistic techniques have enjoyed their biggest successes in mobile robotics, using 
models of the types described in this chapter. The use of more sophisticated probabilis- 
tic models (e.g., probabilistic models of robot dynamics) remains largely unexplored 
in the literature. Such extensions, however, are not infeasible. As this chapter illus- 
trates, deterministic robot actuator models are "probilified" by adding noise variables 
that characterize the types of uncertainty that exist in robotic actuation. 


In theory, the goal of a proper probabilistic model may appear to accurately model the 
specific types of uncertainty that exist in robot actuation and perception. In practice, 
the exact shape of the model often seems to be less important than the fact that some 
provisions for uncertain outcomes are provided in the first place. In fact, many of the 
models that have proven most successful in practical applications vastly overestimate 
the amount of uncertainty. By doing so, the resulting algorithms are more robust to 
violations of the Markov assumptions (Chapter 2.4.4), such as unmodeled state and the 
effect of algorithmic approximations. We will point out such findings in later chapters, 
when discussing actual implementations of probabilistic robotic algorithms. 


5.2 PRELIMINARIES 


5.2.1 Kinematic Configuration 


Kinematics is the calculus describing the effect of control actions on the configuration 
of a robot. The configuration of a rigid mobile robot is commonly described by six 
variables, its three-dimensional Cartesian coordinates and its three Euler angles (roll, 
pitch, yaw) relative to an external coordinate frame. The material presented in this 
book is largely restricted to mobile robots operating in planar environments, whose 
kinematic state, or pose, is summarized by three variables. This is illustrated in Fig- 
ure 5.1. The robot’s pose comprises its two-dimensional planar coordinates relative to 
an external coordinate frame, along with its angular orientation. Denoting the former 
as x and y (not to be confused with the state variable x+), and the latter by 0, the pose 
of the robot is described by the following vector: 


(5.1) 


Ф 
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«0.0» 


Figure 5.1 Robot pose, shown in a global coordinate system. 


The orientation of a robot is often called bearing, or heading direction. As shown in 
Figure 5.1, we postulate that a robot with orientation 0 — 0 points into the direction 
of its x-axis. A robot with orientation 0 = .57 points into the direction of its y-axis. 


Pose without orientation will be called location. The concept of location will be im- 
portant in the next chapter, when we discuss measures to perceive robot environments. 


For simplicity, locations in this book are usually described by two-dimensional vec- 
tors, which refer to the z-y coordinates of an object: 


x 
(=) вэ 


Sometimes we will describe locations in the full 3D coordinate frame. Both the pose 
and the locations of objects in the environment may constitute the kinematic state x; 
of the robot-environment system. 


5.2.2 Probabilistic Kinematics 


The probabilistic kinematic model, or motion model plays the role of the state transi- 
tion model in mobile robotics. This model is the familiar conditional density 


p(z« | Ut, 23-1) (5.3) 


Here x; and x;_; are both robot poses (and not just its x-coordinates), and и; is a 
motion command. This model describes the posterior distribution over kinematics 
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(a) (b) 


— 4 


Figure 5.2 The motion model: Posterior distributions of the robot's pose upon executing 
the motion command illustrated by the solid line. The darker a location, the more likely it 
is. This plot has been projected into 2D. The original density is three-dimensional, taking 
the robot's heading direction 0 into account. 


states that a robots assumes when executing the motion command u+ when its pose is 
2+1. In implementations, и; is sometimes provided by a robot's odometry. However, 
for conceptual reasons we will refer to и; as control. 


Figure 5.2 shows two examples that illustrate the kinematic model for a rigid mobile 
robot operating in a planar environment. In both cases, the robot's initial pose is 
ж—1. The distribution p(x; | ut, 2,1) is visualized by the grayly shaded area: The 
darker a pose, the more likely it is. In this figure, the posterior pose probability is 
projected into x-y-space, that is, the figure lacks a dimension corresponding to the 
robot’s orientation. In Figure 5.2a, a robot moves forward some distance, during which 
it may accrue translational and rotational error as indicated. Figure 5.2b shows the 
resulting distribution of a more complicated motion command, which leads to a larger 
spread of uncertainty. 


This chapter provides in detail two specific probabilistic motion models p(x; | 
Uz, 14.1), both for mobile robots operating in the plane. Both models are some- 
what complimentary in the type of motion information that is being processed. The 
first model assumes that the motion data и, specifies the velocity commands given to 
the robot's motors. Many commercial mobile robots (e.g., differential drive, synchro 
drive) are actuated by independent translational and rotational velocities, or are best 
thought of being actuated in this way. The second model assumes that one is provided 
with odometry information. Most commercial bases provide odometry using kine- 
matic information (distance traveled, angle turned). The resulting probabilistic model 
for integrating such information is somewhat different from the velocity model. 
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In practice, odometry models tend to be more accurate than velocity models, for the 
simple reasons that most commercial robots do not execute velocity commands with 
the level of accuracy that can be obtained by measuring the revolution of the robot's 
wheels. However odometry is only available post-the-fact. Hence it cannot be used for 
motion planning. Planning algorithms such as collision avoidance have to predict the 
effects of motion. Thus, odometry models are usually applied for estimation, whereas 
velocity models are used for probabilistic motion planning. 


5.3 VELOCITY MOTION MODEL 


The velocity motion model assumes that we can control a robot through two veloci- 
ties, a rotational and a translational velocity. Many commercial robots offer control 
interfaces where the programmer specifies velocities. Drive trains that are commonly 
controlled in this way include the differential drive, the Ackerman drive, the synchro- 
drive, and some holonomic drives (but not all). Drive systems not covered by our 
model are those without non-holonomic constraints, such as robots equipped with 
Mecanum wheels or legged robots. 


We will denote the translational velocity at time t by v+, and the rotational velocity by 
w+. Hence, we have 


uw = Ce) (5.4) 


We arbitrarily postulate that positive rotational velocities о; induce a counterclockwise 
rotation (left turns). Positive translational velocities v; correspond to forward motion. 


5.3.1 Closed Form Calculation 


A possible algorithm for computing the probability p(x, | uz, 2,1) is shown in Ta- 
ble 5.1. It accepts as input an initial pose 2; = (x y Ө)”, a control и, = (v w)?, 
and a hypothesized successor pose т, = (2’ y' 6')7. It outputs the probability 
р(2 | Ut, 211) of being at x, after executing control и; beginning in state x,_1, as- 
suming that the control is carried out for the fixed duration At. The parameters œ; to 
ов are robot-specific motion error parameters. This algorithm first calculates the con- 
trols of an error-free robot; the meaning of the individual variables in this calculation 
will become more apparent below, when we derive it. These parameters are given by 
ô and c. 
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(a) (b) (c) 


Figure 5.3 The velocity motion model, for different noise parameter settings. 


The function prob(z, b) models the motion error. It computes the probability of its 
parameter x under a zero-centered random variable with variance b. Two possible 
implementations are shown in Table 5.2, for error variables with normal distribution 
and triangular distribution, respectively. 


Figure 5.3 shows examples of this velocity motion model, projected into x-y-space. In 
all three cases, the robot sets the same translational and angular velocity. Figure 5.3a 
shows the resulting distribution with moderate error parameters o to ag. The dis- 
tribution shown in Figure 5.3b is obtained with smaller angular error (parameters a3 
and од) but larger translational error (parameters a; and о»). Figure 5.3c shows the 
distribution under large angular and small translational error. 


5.3.2 Sampling Algorithm 


For particle filters (cf. Section 4.2.1), it suffices to sample from the motion model 
p(zi | Ut, 211), instead of computing the posterior for arbitrary £+, и, and c, 1. 
Sampling from a conditional density is different than calculating the density: In sam- 
pling, one is given и; and z,. апа seeks to generate a random x, drawn according to 
the motion model p(x; | uz, 2,1). When calculating the density, one is also given т, 
generated through other means, and one seeks to compute the probability of x; under 


p(zi | Ut, 241). 


The algorithm sample. motion. model velocity in Table 5.3 generates random sam- 
ples from p(x; | uz, 2,1) for a fixed control u; and pose z,..,. It accepts 2; and и, 
as input and generates a random pose x; according to the distribution p(x; | uz, 2.1). 
Line 2 through 4 “perturb” the commanded control parameters by noise, drawn from 
the error parameters of the kinematic motion model. The noise values are then used 
to generate the sample's new pose, in Lines 5 through 7. Thus, the sampling pro- 
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1: Algorithm motion model velocity(z;, Ut, x; 1): 
И _ 1 (x — z') cos0 + (y — y') sin0 
` 2 (y — y)cos0 — (x — x')sin 0 
, 
Іт 
3: a* = —— tuy - y) 
, 
уту 
n y = ET (al — 2) 
5: r* = V(z —2*)? + (y—y*)? 
6: АӨ = atan2(y' — у“, a2’ — z*) — atan2(y — y*, x — x*) 
АӨ 
T: ô = — т“ 
At 
АӨ 
8: ð= — 
At 
Y a 0'—0 ^ 
9: =a Tw 
10: return prob(v — 6, a;|v| + ag|w|) * prob(w — Ф, a3|v| + o4|w]) 
: prob(4, a5|v| + ов|ш|) 
Table 5.1 Algorithm for computing p(x; | ut, 4-1) based on velocity information. 
Here we assume 2+1 is represented by the vector (x y 0)7; æ+ is represented by 
(a! y! 0')T ; and иу is represented by the velocity vector (v w)7 . The function prob(a, b) 
computes the probability of its argument a under a zero-centered distribution with variance 
b. It may be implemented using any of the algorithms in Table 5.2. 
Т: Algorithm prob normal distribution(a, b): 
1 _1а? 
2: return zz € 95 
3: Algorithm prob triangular distribution(a, b): 
4: if |a| > v 6b 
Э: return 0 
6: else 
T: return кош 


Table 5.2 Algorithms for computing densities of a zero-centered normal distribution and 
the triangular distribution with variance b. 
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Algorithm sample. motion model velocity(u;, 2; 1): 
0 = v + sample(o; |v| + a3 |w|) 

ô = w + sample(os|v| + a4|w]) 

4 = sample(as|v| + аво) 

п =х— P sin0 + 2 sin(0 + OA) 

у = y + $cos0 — 2 cos(0 + ФА?) 

0 = 0 GAL NYAC 


/ / / 
return x, = (a, y',0 


Б 


Table 5.3 Algorithm for sampling poses x; = (x' у’ 0')T from a pose 22-1 = 
(ж y 0)" and a control и, = (v w)T. Note that we are perturbing the fi nal orienta- 
tion by an additional random term, 4. The variables o1 through ag are the parameters of 
the motion noise. The function sample(b) generates a random sample from a zero-centered 


distribution with variance b. It may, for example, be implemented using the algorithms in 
Table 5.4. 


cd 


Algorithm sample normal distribution(5): 


12 
b 
return 6 a rand(-1, 1) 


1=1 


Algorithm sample triangular distribution(b): 


retum b. - rand(—1,1)- rand(-1,1) 


Table5.4 Algorithm for sampling from (approximate) normal and triangular distributions 
with zero mean and variance b. The function rand(z, y) is assumed to be a pseudo random 
number generator with uniform distribution in [z, y]. 
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(b) (c) 


Figure 5.4 Sampling from the velocity motion model, using the same parameters as in 
Figure 5.3. Each diagram shows 500 samples. 


cedure implements a simple physical robot motion model that incorporates control 
noise in its prediction, in just about the most straightforward way. Figure 5.4 illus- 
trates the outcome of this sampling routine. It depicts 500 samples generated by sam- 
ple. motion model velocity. The reader might want to compare this figure with the 
density depicted in in Figure 5.3. 


We note that in many cases, it is easier to sample x, than calculate the density of a 
given x+. This is because samples require only a forward simulation of the physical 
motion model. To compute the probability of a hypothetical pose amounts to retro- 
guessing of the error parameters, which requires to calculate the inverse of the physical 
motion model. The fact that particle filters rely on sampling makes them specifically 
attractive from an implementation point of view. 


5.3.3 Mathematical Derivation 


We will now derive the algorithms mo- 
tion model velocity and sample motion. model velocity. As usual, the reader not 
interested in the mathematical details is invited to skip this section at first reading, and 
continue in Section 5.4 (page 107). The derivation begins with a generative model of 
robot motion, and then derives formulae for sampling and computing p(x; | Ut, 241) 
for arbitrary £+, ut, and £41. 


Exact Motion 
Before turning to the probabilistic case, let us begin by stating the kinematics for an 


ideal, noise-free robot. Let и, = (v ш)” denote the control at time t. If both velocities 
are kept at a fixed value for the entire time interval (t— 1, t], the robot moves on a circle 
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<хо,ус> x 


Figure 5.5 Motion carried out by a noise-free robot moving with constant velocities v and 
w and starting at (x y 0)T. 


with radius 


(5.5) 


This follows from the general relationship between the translational and rotational 
velocities v and ш for an arbitrary object moving on a circular trajectory with radius т: 


0 = ш: т. (5.6) 


Equation (5.5) encompasses ће case where the robot does not turn at all (1.е., w = 0), 
in which case the robot moves on a straight line. A straight line corresponds to a circle 
with infinite radius, hence we note that r may be infinite. 


Let z;.1 = (2, y, 0)7 be the initial pose of the robot, and suppose we keep the velocity 


constant at (v ш)? for some time At. As one easily shows, the center of the circle is 
at 


те = т “sin (5.7) 


Ye = у + —соз@ (5.8) 
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The variables (x, yc)? denote this coordinate. After At time of motion, our ideal 
robot will be at т, = (a^, y',0^)7 with 


х! т. + Z 5іп(0 ФА?) 
y = Uc — 2 cos(0 + wAt) 
6' 0 - wt 
x —2 sind + 2 sin(0 + wAt) 
= y |+ 2 cos0 — > cos(0 + wAt) (5.9) 
0 wAt 


The derivation of this expression follows from simple trigonometry: After At units 
of time, the noise-free robot has progressed v - At along the circle, which caused it’s 
heading direction to turn by ш. At. At the same time, its x and y coordinate is given 
by the intersection of the circle about (xe ye)”, and the ray starting at (£e Ye)? at the 
angle perpendicular to ш. At. The second transformation simply substitutes (5.8) into 
the resulting motion equations. 


Of course, real robots cannot jump from one velocity to another, and keep velocity 
constant in each time interval. To compute the kinematics with non-constant veloci- 
ties, it is therefore common practice to use small values for At, and to approximate 
the actual velocity by a constant within each time interval. The (approximate) final 
pose is then obtained by concatenating the corresponding cyclic trajectories using the 
mathematical equations just stated. 


Real Motion 


In reality, robot motion is subject to noise. The actual velocities differ from the com- 
manded ones (or measured ones, if the robot possesses a sensor for measuring ve- 
locity). We will model this difference by a zero-centered random variable with finite 
variance. More precisely, let us assume the actual velocities are given by 


= v Eay|v|+a2|w| 
= Ж 5.10 
) ( w ) ( Éos|v|d-o || ) i | 


Here £, is a zero-mean error variable with variance b. Thus, the true velocity equals 
the commanded velocity plus some small, additive error (noise). In our model, the 
variance of the error is proportional to the commanded velocity. The parameters a‘, to 
ол (with a; > 0 for i = 1,...,4) are robot-specific error parameters. They model the 
accuracy of the robot. The less accurate a robot, the larger these parameters. 


М 
б е 
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(a) (b) 


Figure 5.6 Probability density functions with variance b: (a) Normal distribution, (b) 
triangular distribution. 


Two common choices for the error £, are: 


= Normal distribution. The normal distribution with zero mean and variance b is 
given by the density function 
1 1 а? 
&(a) = e 29 (5.11) 
V27-b 
Figure 5.6a shows the density function of a normal distribution with variance b. 
Normal distributions are commonly used to model noise in continuous stochastic 
processes, despite the fact that their support, that is the set of points a with p(a) > 
0, is R. 


= Triangular distribution. The density of triangular distribution with zero mean 
and variance b is given by 


0 if |a] > V6b 
us -{ SS otherwise mde 


which is non-zero only in (— v 6b; М6). As Figure 5.6b suggests, the density 
resembles the shape of a symmetric triangle—hence the name. 


A better model of the actual pose т, = (a’ y’ 0')T after executing the motion com- 


mand ш = (v ш)” аіл, 1 = (x y 0)? is thus 
x x —2 sin0 + 2 sin(0 + GAt) 
y' = | y |+| £$cos0— 2 cos(0 + GA) (5.13) 
6' 0 At 


Robot Motion 103 


This equation is simply obtained by substituting the commanded velocity и, = (v w)7 


with the noisy motion (ô ®) in (5.9). However, this model is still not very realistic, 
for reasons discussed in turn. 


Final Orientation 


The two equations given above exactly describe the final location of the robot given 
that the robot actually moves on an exact circular trajectory with radius r — 2. While 
the radius of this circular segment and the distance traveled is influenced by the con- 
trol noise, the very fact that the trajectory is circular is not. The assumption of cir- 
cular motion leads to an important degeneracy. In particular, the support of the den- 
sity p(az | ur, 2.1) is two-dimensional, within a three-dimensional embedding pose 
space. The fact that all posterior poses are located on a two-dimensional manifold 
within the three-dimensional pose space is a direct consequence of the fact that we 
used only two noise variables, one for v and one for w. Unfortunately, this degeneracy 
has important ramifications when applying Bayes filters for state estimation. 


In reality, any meaningful posterior distribution is of course not degenerate, and poses 
can be found within a three-dimensional space of variations in z, y, and 0. To gener- 
alize our motion model accordingly, we will assume that the robot performs a rotation 


^ when it arrives at its final pose. Thus, instead of computing 6’ according to (5.13), 
we model the final orientation by 


0 = 6+4+@At+4At (5.14) 
with 
F = Eas|v|ta6|w| (5.15) 


Here as and og are additional robot-specific parameters that determine the variance 
of the additional rotational noise. Thus, the resulting motion model is as follows: 


m z —2 sin + 2 sin(0 + GAt) 
yl = y |+| £cos0— 2 cos(0 + ФА) (5.16) 
9! 0 wAt+4At 


Computation of p(x; | uz, 24-1) 
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The algorithm motion model velocity in Table 5.1 implements the computation of 
p(r: | Ut, 2:1) for given values of 74-1 = (x y 0)", ш = (v w)f, and 
x, = (a! y! 0')T. The derivation of this algorithm is somewhat involved, as it ef- 
fectively implements an inverse motion model. In particular, motion. model. velocity 
determines motion parameters à, = (6 @) from the poses z,.., and тү, along with 
an appropriate final rotation 4. Our derivation makes it obvious as to why a final ro- 
tation is needed: For most values of 74-1, uz, and x+, the motion probability would 
simply be zero without allowing for a final rotation. 


Let us calculate the probability p(x, | u;,r; 1) of control action и, = (v w 
carrying the robot from the pose 2; = (x y Ө)” to the pose x; = (x! у 0' 
within At time units. To do so, we will first determine the control i = (6 ô 
required to carry the robot from 2,1 to position (x’ y^), regardless of the robot's 
final orientation. Subsequently, we will determine the final rotation ^ necessary for 
the robot to attain the orientation 6’. Based on these calculations, we can then easily 
calculate the desired probability p(x; | uz, x,..1). 


The reader may recall that our model assumes that the robot assumes a fixed velocity 
during At, resulting in a circular trajectory. For a robot that moved from x4; = 
(ж y 0)" toa, = (a' y"), the center of the circle is defined as (x* y*)7 and given 


by 
UE TE —Asind Y  ( 2 + ayy’) 
( y* ) К ( y )+( A cos 0 ) Е | VH. + (a! — a) (5.17) 


for some unknown А, и € R. The first equality is the result of the fact that the cir- 
cle's center is orthogonal to the initial heading direction of the robot; the second is 
a straightforward constraint that the center of the circle lies on a ray that lies on the 
half-way point between (x y)" and (x’ y')" and is orthogonal to the line between 
these coordinates. 


Usually, Equation (5.17) has a unique solution—except in the degenerate case of w = 
0, in which the center of the circle lies at infinity. As the reader might want to verify, 
the solution is given by 


(5.18) 
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and hence 


zr—2/')cos0--(y—y') sind 
е (у= y) 519 
(a—a’) cos 0--(y—y') sind ( / x) ( Е ) 
(y—y’) соѕ0— (2—2) sin 0 


М 
о 8 
* ж 
Ми 
М 
ENT 

юрт 
d 
NIF ple 


The radius of the circle is now given by the Euclidean distance 


P= PEEP eG rt a= Vere yey (5.20) 
Furthermore, we can now calculate the change of heading direction 
АӨ = atan2(y' — y*,a! — x*) — atan2(y — y*, x — x*) (5.21) 


Here atan2 is the common extension of the arcus tangens of y/z extended to the R? 
(most programming languages provide an implementation of this function): 


atan(y/z) ifr0 
i — at ifz <0 
atan2(y, 2) m nw) (т а ап(|у/х|)) T ie (5.22) 
sign(y) 7/2 ifr—0,y20 


Since we assume that the robot follows a circular trajectory, the translational distance 
between x; and х; (along this circle) is 


Adist = r* - A0 (5.23) 


From Adist and A, it is now easy to compute the velocities 6 and à: 


Е i Adist 
) = At ( A ) (5.24) 


=> 
Il 
AL cC 
& е 


The angle of the final rotation 5 сап be determined according to (5.14) as: 


4 = At !l(9—0)-ó (5.25) 
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The motion error is the deviation of ti; and 4 from the commanded velocity и, = 
(и w)? and y = 0, as defined in Equations (5.24) and (5.25). 


Uer = U= © (5.26) 
Wer = ш—@ (5.27) 
Yerr = ү (5.28) 


Under our error model, specified in Equations (5.10), and (5.15), these errors have the 
following probabilities: 


Soy; оры era) (5.29) 
£os|v|--ou uw] (err) (5.30) 
Басре) (5.31) 


where єъ denotes а zero-mean error variable with variance b, as before. Since we 
assume independence between the different sources of error, the desired probability 
p(zi | us, 21) is the product of these individual errors: 


p(zxi|unczii) = £o, |v|-+ag|u| (Verr) * £os|v|-Fou w| (err) ` €os|v| cago] r2) 


To see the correctness of the algorithm motion model velocity in Table 5.1, the reader 
may notice that this algorithm implements this expression. More specifically, lines 2 
to 9 are equivalent to Equations (5.18), (5.19), (5.20), (5.21), (5.24), and (5.25). Line 
10 implements (5.32), substituting the error terms as specified in Equations (5.29) to 
(5.31). 


Sampling from p(s’ | a, s) 


The sampling algorithm sample. motion model velocity in Table 5.3 implements a 
forward model, as discussed earlier in this section. Lines 5 through 7 correspond 
to Equation (5.16). The noisy values calculated in lines 2 through 4 correspond to 
Equations (5.10) and (5.15). 


The algorithm sample. normal distribution in Table 5.4 implements a common ap- 
proximation to sampling from a normal distribution. This approximation exploits 
the central limit theorem, which states that any average of non-degenerate random 
variables converges to a normal distribution. By averaging 12 uniform distribu- 
tions, sample normal distribution generates values that are approximately normal 
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Srot2 


Otrans 


Figure 5.7 Odometry model: The robot motion in the time interval (t — 1, t] is approxi- 
mated by a rotation óroc1, followed by a translation dtrans and a second rotation ôrot2. The 
turns and translation are noisy. 


distributed; though technically the resulting values lie always in [—2b, 2b]. Finally, 
sample triangular distribution in Table 5.4 implements a sampler for triangular dis- 
tributions. 


5.4 ODOMETRY MOTION MODEL 


The velocity motion model discussed thus far uses the robot's velocity to compute pos- 
teriors over poses. Alternatively, one might want to use the odometry measurements 
as the basis for calculating the robot's motion over time. Odometry is commonly 
obtained by integrating wheel encoders information; most commercial robots make 
such integrated pose estimation available in periodic time intervals (e.g., every tenth 
of a second). Practical experience suggests that odometry, while still erroneous, is 
usually more accurate than velocity. Both suffer from drift and slippage, but veloc- 
ity additionally suffers from the mismatch between the actual motion controllers and 
its (crude) mathematical model. However, odometry is only available in retrospect, 
after the robot moved. This poses no problem for filter algorithms, but makes this 
information unusable for accurate motion planning and control. 
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1: Algorithm motion model odometry(z;, ut, x; 1): 


2: бо = atan2(y' — g,3' — 3) — 0 


ÜÓtrans = NIU = zy F (y z. у')* 
Oro 2 — 6’ ET 0 — Óroti 


froti = atan2(y' — у, – х) — 0 
di: = VC ES ge F (y m y? 


буо 2 — 0 —0— drott 


Рі = prob(ó,o m opis доа UN 02 dtrans) 
9: P2 = prob (dtrans = Otrans: Q3 ÜÓtrans + Q4 (Orot1 sf drot2)) 
10: pa = prob(6;ot2 = Órot2; O19rot2 EE Q2Ôtrans) 
11: return pı · po · pa 


Table 5.5 Algorithm for computing р(2+ | ut, 2.1) based on odometry information. 
Here the control и is 


5.4.1 Closed Form Calculation 


This section defines an alternative motion model that uses odometry measurements 
in lieu of controls. Technically, odometry are sensor measurements, not controls. To 
model odometry as measurements, the resulting Bayes filter would have to include the 
actual velocity as state variables—which increases the dimension of the state space. 
To keep the state space small, it is therefore common to simply consider the odometry 
as if it was a control signal. In this section, we will do exactly this, and treat odometry 
measurements as controls. The resulting model is at the core of many of today's best 
probabilistic robot systems. 


Let us define the format of our control information. At time t, the correct pose of 
the robot is modeled by the random variable x;. The robot odometry estimates this 
pose; however, due to drift and slippage there is no fixed coordinate transformation 
between the coordinates used by the robot's internal odometry and the physical world 
coordinates. In fact, knowing this transformation would solve the robot localization 
problem! 
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(a) (b) (c) 


Figure 5.8 Тһе odometry motion model, for different noise parameter settings. 


The odometry model uses the relative information of the robot's internal odometry. 
More specifically, In the time interval (t — 1, t], the robot advances from a pose 2,1 
to pose тү. The odometry reports back to us a related advance from 7; = (2 ў 0) 
to z, = (2° y' 0'). Here the bar indicates that these are odometry measurements, 
embedded in a robot-internal coordinate whose relation to the global world coordinates 
is unknown. The key insight for utilizing this information in state estimation is that the 
relative difference between r,., and z+, under an appropriate definition of the term 
“difference,” is a good estimator for the difference of the true poses ж,—1 and x+. The 
motion information uz is, thus, given by the pair 


TES Ge (5.33) 


To extract relative odometry, u; is transformed into a sequence of three steps: a rota- 
tion, followed by a straight line motion (translation) and another rotation. Figure 5.7 
illustrates this decomposition: the initial turn is called 6,041, the translation dtrans, and 
the second rotation $, „кх. As the reader easily verifies, each pair of positions (5 s") 
has a unique parameter vector (d;ot1 trans ôrot2) 7, and these parameters are suffi- 
cient to reconstruct the relative motion between s and 5”. Thus, б.о, Otrans; Ôrot2 i8 a 
sufficient statistics of the relative motion encoded by the odometry. Our motion model 
assumes that these three parameters are corrupted by independent noise. The reader 
may note that odometry motion uses one more parameter than the velocity vector de- 
fined in the previous section, for which reason we will not face the same degeneracy 
that led to the definition of a "final rotation." 


Before delving into mathematical detail, let us state the basic algorithm for calculating 
this density in closed form. Table 5.5 depicts the algorithm for computing p(x, | 
Ut, 2.1) from odometry. This algorithm accepts as an input an initial pose 1; 1, а 
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1: Algorithm sample motion model odometry(u;, x;..1): 


2: бо = atan2(y’ — g,3/ — 3) — 0 


Otrans = Wats — 2')? + (y р y’): 
dro 2 — 6! P 0— бов 


Oro i Óroti m sample(o1óroti T Q2Ôtrans) 


Otrans = Otrans = sample(a3 Otrans + QA (Oros + Órot2)) 


Oro 2 — Órot2 sample(o1óroto EI Q2Ó0trans) 


m 1+ Dos cos(0 + door) 


9: y =? fians sin(0 + drott) 
10: 0  —0-4 боа an б.о 
11: return x, = (2',y', 0’)? 


Table 5.6 Algorithm for sampling from p(z« | ut, 2,1) based on odometry information. 
Here the pose at time t is represented by 74-1 = (a у 0)". The control is a differentiable 
set of two pose estimates obtained by the robot’s odometer, u; = (2—1 z)”, with 
2+1 = (2 J 0) and % = (a ӯ! 0"). 


pair of poses и, = (z,.., Z+)” obtained from the robot's odometry, and a hypothesized 
final pose x+. It outputs the numerical probability p(z; | uz, 2,1). 


Let us dissect this algorithm. Lines 2 to 4 recover relative motion parame- 
ters (drot1  Otrans б) from the odometry readings. As before, they im- 
plement an inverse motion model. The corresponding relative motion parameters 
(бон Ôtrans боо) Т for the given poses 2; and т, are calculated in Lines 5 through 
7 of this algorithm. Lines 8 to 10 compute the error probabilities for the individual 
motion parameters. As above, the function prob(a, b) implements an error distribu- 
tion over a with zero mean and variance b. Here the implementer must observe that all 
angular differences must lie in [—7, т]. Hence the outcome of боо — Srot2 has to be 
truncated correspondingly—a common error that tends to yield occasional divergence 
of software based on this model. Finally, Line 11 returns the combined error proba- 
bility, obtained by multiplying the individual error probabilities pı, po, and рз. This 
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(b) 


Figure 5.9 Sampling from the odometry motion model, using the same parameters as in 
Figure 5.8. Each diagram shows 500 samples. 


last step assumes independence between the different error sources. The variables a; 
through o4 are robot-specific parameters that specify the noise in robot motion. 


Figure 5.8 shows examples of our odometry motion model for different values of the 
error parameters o, to a4. The distribution in Figure 5.8a is a “proto-typical” one, 
whereas the ones shown in Figures 5.8b and 5.8c indicate unusually large transla- 
tional and rotational errors, respectively. The reader may want to carefully compare 
these diagrams with those in Figure 5.3 on page 96. The smaller the time between to 
consecutive measurements, the more similar those different motion models. Thus, if 
the belief is updated frequently e.g., every tenth of a second for a conventional indoor 
robot, the difference between these motion models is not very significant. In gen- 
eral, the odometry model is preferable to the velocity model when applicable, since 
odometers are usually more accurate than velocity controls—especially if those veloc- 
ity values are not sensed but instead submitted to a PID controller that sets the actual 
motor currents. 


5.4.2 Sampling Algorithm 


If particle filters are used for localization, we would also like to have an algorithm for 
sampling from р(2 | ui, x11). Recall that particle filters (cf. Chapter 4.2.1) require 
samples of p(x, | uz, x..1), rather than a closed-form expression for computing p(x; | 
Ut, 201) for any 2,1, ие, and x+. The algorithm sample. motion model. odometry, 
shown in Table 5.6, implements the sampling approach. It accepts an initial pose z,..1 
and an odometry reading и; as input, and outputs a random =; distributed according 
to р(2 | ui, 1-1). It differs from the previous algorithm in that it randomly guesses 
a pose x, (Lines 5-10), instead of computing the probability of a given x,. As before, 
the sampling algorithm sample. motion model odometry is somewhat easier to im- 
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Start location 


hs 


10 meters 


Figure 5.10 Sampling approximation of the position belief for a non-sensing robot. The 


solid line displays the actions, and the samples represent the robot's belief at different points 
in time. 


plement than the closed-form algorithm motion. model odometry, since it side-steps 
the need for an inverse model. 


Figure 5.9 shows exam- 
ples of sample sets generated by sample motion model odometry, using the same 
parameters as in the model shown in Figure 5.8. Figure 5.10 illustrates the motion 
model “in action" by superimposing sample sets from multiple time steps. This data 
has been generated using the motion update equations of the algorithm particle filter 
(Table 4.3), assuming the robot's odometry follows the path indicated by the solid line. 
The figure illustrates how the uncertainty grows as the robot moves. The samples are 
spread across an increasingly larger space. 
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5.4.3 Mathematical Derivation 


The derivation of the algorithms is relatively straightforward, and once again may be 
skipped at first reading. To derive a probabilistic motion model using odometry, we 
recall that the relative difference between any two poses is represented by a concatena- 
tion of three basic motions: a rotation, a straight-line motion (translation), and another 
rotation. The following equations show how to calculate the values of the two rotations 
and the translation from the odometry reading u; = (Z+—1 2), with z; = (2 jj 0) 


! wp, 


and т, = (z^ y' 0^: 


бол = atan2(y – 9,2 – 2) – 0 (5.34) 
trans = V(E—-2#/)2+(9-9') (5.35) 
drot2 = 0' сав 0 m Óroti (5.36) 


To model the motion error, we assume that the “true” values of the rotation and trans- 
lation are obtained from the measured ones by subtracting independent noise £y with 
zero mean and variance b: 


Óroti zx Óroti = Eay|brot1|+a2|Strans| (5.37) 
Ótrans = — Ótrans — Eas |Serans|-+04|drot1 +5rotal (5.38) 
Órot2 =  Órot2 — £e |6, |+аз [ба| (5.39) 


As in the previous section, єъ is a zero-mean noise variable with variance b (e.g., with 
normal or triangular distribution). The parameters o to од are robot-specific error 
parameters, which specify the error accrued with motion. 


Consequently, the true position, x+, is obtained from x;_, by an initial rotation with 
angle NON followed by a translation with distance бузыла, followed by another rotation 
with angle ee Thus, 


т' т Îtrans cos(0 uS буо) 
у T y Ег Otrans sin(0 br Orot1) (5.40) 
0 0 0 + Óroti T Órot2 


Notice that algorithm sample motion model odometry implements Equations (5.34) 
through (5.40). 
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The algorithm motion model odometry is obtained by noticing that Lines 5-7 com- 
pute the motion parameters буг, Ôtrans, and ee for the hypothesized pose z;, rela- 
tive to the initial pose 2,1. The difference of both, 


бо — бео (5.41) 
Ütrans = бей (5.42) 
Órot2 = Órot2 (5.43) 


is the error in odometry, assuming of course that x, is the true final pose. The error 
model (5.37) to (5.39) implies that the probability of these errors is given by 


Di =  Eay|6rot1|+o2|Strans| (Orot1 — Órot1) (5.44) 
p2 = Eas |Serans|+er4|Sror1t6rov2| (Strans — Strans) (5.45) 
pa = Eay|brot2|+a2|Strans| (ово az Órot2) (5.46) 


with the distributions = defined as above. These probabilities are computed in Lines 
8-10 of our algorithm motion model. odometry, and since the errors are assumed to 
be independent, the joint error probability is the product ру. p» - рз (cf., Line 11). 


5.5 MOTION AND MAPS 


By considering p(x; | Uut, 2.1), we defined robot motion in a vacuum. In particular, 
this model describes robot motion in the absence of any knowledge about the nature 
of the environment. In many cases, we are also given a map т, which may contain 
information pertaining to the places that a robot may or may not be able to navigate. 
For example, occupancy maps, which will be explained in Chapter ??, distinguish 
free (traversable) from occupied terrain. The robot's pose must always be in the free 
space. Therefore, knowing m gives us further information about the robot pose x; 
before, during, and after executing a control u+. 


This consideration calls for a motion model that takes the map m into account. We 
will write this model as p(x, | ш, 2—1, n), indicating that it considers the map m in 
addition to the standard variables. If m carries information relevant to pose estimation, 
we have 


p(zi | Ut, 241) * p(zi | ut, 244, M) (5.47) 
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The motion model p(z, | и, 2,1,7) should give better results than the map-free 
motion model p(x; | ut, 2.1). We will refer to p(z; | ui, xi 1, m) as map-based 
motion model. The map-based motion model computes the likelihood that a robot 
placed in a world with map m arrives at pose x; upon executing action и at pose 
231. Unfortunately, computing this motion model in closed form is difficult. This 
is because to compute the likelihood of being at x+ after executing action u;, one has 
to incorporate the probability that an unoccupied path exists between x;_1 and х; and 
that the robot might have followed this unoccupied path when executing the control 
t4;—a complex operation. 


Luckily, there exists an efficient approximation for the map-based motion model, 
which works well if the distance between х; and x; is small (e.g., smaller than 
half a robot diameter). The approximation factorizes the map-based motion model 
into two components: 


p(x | Ut, e—-1,™M) = n p(x | Ut, 241) p(zi | т) (5.48) 


where 77 is the usual normalizer. According to this factorization, we simply multiply 
the map-free estimate p(z; | uz, ж,—1) with a second term, p(x; | m), which expresses 
the “consistency” of pose x; with the map m. In the case of occupancy maps, p(x: | 
m) = 0 if and only if the robot “collides” with an occupied grid cell in the map; 
otherwise it assumes a constant value. By multiplying p(x+ | m) and p(z; | we, x1), 
we obtain a distribution that assigns all probability mass to poses z, , consistent 
with the map, which otherwise has the same shape as p(x; | ш, 2:1). As 7 can be 
computed by normalization, this approximation of a map-based motion model can be 
computed efficiently without any significant overhead compared to a map-free motion 
model. 


Table 5.7 states the basic algorithms for computing and for sampling from the map- 
based motion model. Notice that the sampling algorithm returns a weighted sample, 
which includes an importance factor proportional to p(z, | m). Care has to be taken in 
the implementation of the sample version, to ensure termination of the inner loop. Ап 
example of the motion model is illustrated in Figure 5.11. The density in Figure 5.11a 
is p(x; | Ut, 24.1), computed according to the velocity motion model. Now suppose 
the map m possesses a long rectangular obstacle, as indicated in Figure 5.11b. The 
probability р(2 | m) is zero at all poses 1; where the robot would intersect the 
obstacle. Since our example robot is circular, this region is equivalent to the obstacle 
grown by a robot radius (this is equivalent to mapping the obstacle from workspace to 
the robot's configuration space [19] or pose space). The resulting probability p(z; | 
Uz, 2.1, M), shown in Figure 5.11b, is the normalized product of p(x; | m) and p(x | 
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1: Algorithm motion model with map(z;, uz, x11, M): 

2: return plore | илал) + prs | m) 

1: Algorithm sample. motion model. with map(u;, zr, 1, m): 
2; do 

3: ж = sample. motion model(u;, z; 1) 

3: T= p(z: | m) 

4: until 1 > 0 

5: return (x4, T) 


Table 5.7 Algorithm for computing p(z« | ut, 2.1, т), which utilizes a map m of the 
environment. This algorithms bootstraps previous motion models (Tables 5.1, 5.3, 5.5, and 
5.6) to models that take into account that robots cannot be placed in occupied space in the 
map m. 


Ut, 201). It is zero in the extended obstacle area, and proportional to p(x; | Ut, 241) 
everywhere else. 


Figure 5.11 also illustrates a problem with our approximation. The region marked (*) 
possesses non-zero likelihood, since both p(x, | ut, ;1) and р(х; | m) are non- 
Zero in this region. However, for the robot to be in this particular area it must have 
gone through the wall, which is impossible in the real world. This error is the result 
of checking model consistency at the final pose x;_ only, instead of verifying the 
consistency of the robot's path to the goal. In practice, however, such errors only occur 
for relatively large motions u;, and it can be neglected for higher update frequencies. 


To shed light onto the nature of the approximation, let us briefly derive it. Equation 
(5.48) can be obtained by applying Bayes rule: 


p(zi | Ut, Lt-1,M) E n p(m | Lt, Ut, a og) p(zi | Ut, 21-1) (5.49) 


If we approximate p(m | £t, Ut, zi 1) by p(m | 21), we obtain the desired equation 
by once again applying Bayes rule: 


P(T | ui %t-1,mM) = m p(m | 2) plti | Ue, zii) 
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(a) p(w | ut, 221) (b) p(z« | ut, zt—1, m) 
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Figure 5.11 Velocity motion model (a) without a map and (b) conditioned on a map m. 


= p(x, | m) p(zi | ur, 241) (5.50) 


where 77 is the normalizer (notice that the value of э is different for the different steps 
in our transformation). This brief analysis shows that our map-based model is justified 
under the rough assumption that 


p(m|ziu,zi-1) = p(m |x) (5.51) 


Obviously, these expressions are not equal. When computing the conditional over m, 
our approximation omits two terms: и; and x+. By omitting these terms, we discard 
any information relating to the robot's path leading up to x+. All we know is that its 
final pose is 2;. We already noticed the consequences of this omission in our example 
above, when we observed that poses behind a wall may possess non-zero likelihood. 
Our approximate map-based motion model may falsely assume that the robot just went 
through a wall, as long as the initial and final poses are in the unoccupied space. How 
damaging can this be? As noted above, this depends on the update interval. In fact, 
for sufficiently high update rates, and assuming that the noise variables in the motion 
model are bounded, we can guarantee that the approximation is tight and this effect 
will not occur. 


This analysis illustrates a subtle insight pertaining to the implementation of the algo- 
rithm. In particular, one has to pay attention to the update frequency. A Bayes filter 
that is updated frequently might yield fundamentally different results than one that is 
updated occasionally only. 
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5.6 SUMMARY 


This section derived the two principal probabilistic motion models for mobile robots 
operating on the plane. 


m We derived an algorithm for the probabilistic motion model р(х; | uz, 241) 
that represents control и; by a translational and angular velocity, executed over а 
fixed time interval At. In implementing this model, we realized that two control 
noise parameters, one for the translational and one for the rotational velocity, 
are insufficient to generate a space-filling (non-generate) posterior. We therefore 
added a third noise parameter, expressed as a noisy "final rotation." 


m We presented an alternative motion model that uses the robot's odometry as input. 
Odometry measurements were expressed by three parameters, an initial rotation, 
followed by a translation, and a final rotation. The probabilistic motion model 
was implemented by assuming that all three of these parameters are subject to 
noise. We noted that odometry readings are technically not controls; however, by 
using them just like controls we arrived at a simpler formulation of the estimation 
problem. 


m For both motion models, we presented two types of implementations: One in 
which the probability p(x, | u;,x;..1) is calculated in closed form, and one that 
enables us to generate samples from p(x | ut, 211). The closed-form expres- 
sion accepts as an input £+, uz, and z, ,, and outputs a numerical probability 
value. To calculate this probability, the algorithms effectively invert the motion 
model, to compare the actual with the commanded control parameters. The sam- 
pling model does not require such an inversion. Instead, it implements a forward 
model of the motion model p(x; | ut, z;..1). It accepts as an input the values и; 
and 2, and outputs a random z; drawn according to p(x; | uz, x11). Closed- 
form models are required for some probabilistic algorithms. Others, most notably 
particle filters, utilize sampling models. 


= Finally we extended all motion models to incorporate a map of the environment. 
The resulting probability p(x; | uz, 2.1, m) incorporates a map m in its condi- 
tional. This extension followed the intuition that the map specifies where a robot 
may be; which has an effect of the ability to move from pose x;_1 to x+. Our 
extension was approximate, in that we only checked for the validity of the final 
pose. 


The motion models discussed here are only examples: Clearly, the field of robotic 
actuators is much richer than just mobile robots operating in flat terrain. Even within 
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the field of mobile robotics, there exist a number of devices that are not covered by 
the models discussed here. Examples include holonomic robots which can move side- 
wards. Further, our description does not consider robot dynamics, which are impor- 
tant for fast-moving vehicles such as cars on highways. Most of these robots can be 
modeled analogously: simply specify the physical laws of robot motion, and spec- 
ify appropriate noise parameters. For dynamic models, this will require to extend the 
robot state by a velocity vector which captures the dynamic state of the vehicle. In 
many ways, these extensions are straightforward. Rather than cluttering this book 
with more motion models, it is now time to move on to the important topic of sensor 
measurements. 


5.7 BIBLIOGRAPHICAL REMARKS 


Typical drives covered by this model are the the differential drive, the Ackerman drive, 
or the synchro-drive [4, ?]. Drive systems not covered by our model are those without 
non-holonomic constraints [19] like robots equipped with Mecanum wheels [4, ?] or 
even legged robots. 
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MEASUREMENTS 


6.1 INTRODUCTION 


Measurement models comprise the second domain-specific model in probabilistic 
robotics, next to motion models. Measurement models describe the formation pro- 
cess by which sensor measurements are generated in the physical world. Today's 
robots use a variety of different sensor modalities, such as tactile sensors, range sen- 
sors, or cameras. The specifics of the model depends on the sensor: Imaging sensors 
are best modeled by projective geometry, whereas sonar sensors are best modeled by 
describing the sound wave and its reflection on surfaces in the environment. 


Probabilistic robotics explicitly models the noise in sensor measurements. Such mod- 
els account for the inherent uncertainty in the robot's sensors. Formally, the measure- 
ment model is defined as a conditional probability distribution p(z, | £+, m), where 
2} 15 the robot pose, 2+ is the measurement at time t, and m is the map of the en- 
vironment. Although we mainly address range-sensors throughout this section, the 
underlying principles and equations are not limited to this type of sensors. Instead the 
basic principle can be applied to any kind of sensor, such as a camera or a bar-code 
operated landmark detector. 


To illustrate the basic problem of mobile robots that use their sensors to perceive their 
environment, Figure 6.1 shows a typical range-scan obtained in a corridor with a mo- 
bile robot equipped with a cyclic array of 24 ultrasound sensors. The distances mea- 
sured by the individual sensors are depicted in black and the map of the environment 
is shown in light grey. Most of these measurements correspond to the distance of the 
nearest object in the measurement cone; some measurements, however, have failed to 
detect any object. The inability for sonar to reliably measure range to nearby objects is 
often paraphrased as sensor noise. Technically, this noise is quite predictable: When 
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Figure 6.1 Typical ultrasound scan of a robot in its environment. 


measuring smooth surfaces (such as walls) at an angle, the echo tends to travel into a 
direction other than the sonar sensor, as illustrated in Figure ??. This effect is called 
specular reflection and often leads to overly large range measurements. The likelihood 
of specular reflection depends on a number of properties, such as the surface material 
and angle, the range of the surface, and the sensitivity of the sonar sensor. Other errors, 
such as short readings, may be caused by cross-talk between different sensors (sound 
is slow!) or by unmodeled objects in the proximity of the robot, such as people. 


As a rule of thumb, the more accurate a sensor model, the better the results—though 
there are some important caveats that were already discussed in Chapter 2.4.4. In prac- 
tice, however, it is often impossible to model a sensor accurately, primarily for two rea- 
sons: First, developing an accurate sensor model can be extremely time-consuming, 
and second, an accurate model may require state variables that we might not know 
(such as the surface material). Probabilistic robotics accommodates the inaccuracies 
of sensor models in the stochastic aspects: By modeling the measurement process 
as a conditional probability density, р( | x+), instead of a deterministic function 
zt = f(z), the uncertainty in the sensor model can be accommodated in the non- 
deterministic aspects of the model. Herein lies a key advantage of probabilistic tech- 
niques over classical robotics: in practice, we can get away with extremely crude mod- 
els. However, when devising a probabilistic model, care has to be taken to capture the 
different types of uncertainties that may affect a sensor measurement. 


Many sensors generate more than one numerical measurement value when queried. 
For example, cameras generate entire arrays of values (brightness, saturation, color); 
similarly, range finders usually generate entire scans of ranges. We will denote the 
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number of such measurement values within a measurement 2; by K, hence write: 
= 1 K 
& = riu } (6.1) 


We will use z* to refer to an individual measurement(e.g., one range value). We will 
approximate p(z; | 2, rn) by the product of the individual measurement likelihoods 


K 


р(ж&|т,т) = [[»G?|zom) (6.2) 
k=1 


Technically, this amounts to an independence assumption between the noise in each 
individual measurement value — just as our Markov assumption assumes independent 
noise over time (c.f., Chapter 2.4.4). This assumption is only true in the ideal case. 
We already discussed possible causes of dependent noise in Chapter 2.4.4. To recapit- 
ulate, dependencies typically exist due to a range of factors: people, who often corrupt 
measurements of several adjacent sensors; errors in the model m; approximations in 
the posterior, and so on. For now, however, we will simply not worry about violations 
of the independence assumption, as we will return to this issue at later chapters. 


6.2 MAPS 


To express the process of generating measurements, we need to specify the environ- 
ment in which a measurement is generated. A map of the environment is a list of 
objects in the environment and their locations. We have already informally discussed 
maps in the previous chapter, where we developed robot motion models that took into 
consideration the occupancy of different locations in the world. Formally, a map m is 
a list of objects in the environment along with their properties: 


m = Imi, ma,..., mw) (6.3) 


Here N is the total number of objects in the environment, and each m, with 1 < 
n < N specifies a property. Maps are usually indexed in one of two ways, knowns 
as feature-based and location-based. In feature-based maps, n is a feature index. The 
value of Mn contains, next to the properties of a feature, the Cartesian location of the 
feature. In location-based maps, the index n corresponds to a specific location. In 
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planar maps, it is common to denote а map element by Mg, instead of mn, to make 
explicit that Mg y is the property of a specific world coordinate, (x y). 


Both types of maps have advantages and disadvantages. Location-based maps are vol- 
umetric, in that they offer a label for any location in the world. Volumetric maps con- 
tain information not only about objects in the environment, but also about the absence 
of objects (e.g., free-space). This is quite different in feature-based maps. Feature- 
based maps only specify the shape of the environment at the specific locations, namely 
the locations of the objects contained in the map. Feature representation makes it eas- 
ler to adjust the position of an object, e.g., as a result of additional sensing. For this 
reason, feature-based maps are popular in the robotic mapping field, where maps are 
constructed from sensor data. In this book, we will encounter both types of maps—in 
fact, we will occasionally move from one representation to the other. 


A classical map representation is known as occupancy grid map, which will be dis- 
cussed in detail in Chapter 9. Occupancy maps are location-based: They assign to each 
x-y coordinate a binary occupancy value which specifies whether or not a location is 
occupied with an object. Occupancy grid maps are great for mobile robot navigation: 
They make it easy to find paths through the unoccupied space. 


Throughout this book, we will drop the distinction between the physical world and the 
map. Technically, sensor measurements are caused by physical objects, not the map 
of those objects. However, it is tradition to condition sensor models on the map т; 
hence we will adopt a notation that suggest measurements depend on the map. 


63 BEAM MODELS OF RANGE FINDERS 


Range finders are among the most popular sensors in robotics. Our first sensor model 
is therefore an approximative physical model of range finders. Range finders mea- 
sure the range to nearby objects. Range may be measured along a beam—which is a 
good model of the workings of laser range finders—or within a cone—which is the 
preferable model of ultrasonic sensors. 


6.3.1 The Basic Measurement Algorithm 


Our model incorporates four types of measurement errors, all of which are essential to 
making this model work: small measurement noise, errors due to unexpected objects, 
errors due to failures to detect objects, and random unexplained noise. The desired 


Measurements 125 


(a) Gaussian distribution pp (b) Exponential distribution psnorc 
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Figure 6.2 Components of the range fi nder sensor model. In each diagram the horizontal 
axis corresponds to the measurement zk, the vertical to the likelihood. 


model p(z; | £+, m) is therefore a mixture of four densities, each of which corresponds 


to a particular type of error: 


1. Correct range with local measurement noise. In an ideal world, a range finder 


would always measure the correct range to the nearest object in its measurement 
field. Let us use z** to denote the “true” range of the object measured by 2}. 
In location-based maps, the range z/* can be determined using ray casting; in 
feature-based maps, it is usually obtained by searching for the closest feature 
within a measurement cone. However, even if the sensor correctly measures the 
range to the nearest object, the value it returns is subject to error. This error arises 
from the limited resolution of range sensors, atmospheric effect on the measure- 
ment signal, and so on. This noise is usually modeled by a narrow Gaussian 
with mean gee and standard deviation с. We will denote the Gaussian by pni. 


Figure 6.2a illustrates this density рь. for a specific value of 2 
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In practice, the values measured by the range sensor are limited to the interval 
[0; zinax], where zmax denotes the maximum sensor range. Thus, the measure- 
ment probability is given by 


(6.4) 


Puis (27 | ж, m) = 


n N (28; AP VU) if 0 < 2f < zmax 
0 otherwise 


where ae is calculated from x; and m via ray tracing, and M (ars zb оў) 


denotes the univariate normal distribution with mean z7* and variance oĉ: 


(ЕЁ —zk*)2 
k. „kx 2 1 -3 ui. 
Мба ов) = a= е bi (6.5) 
V 270K it 
The normalizer 7 evaluates to 
Zmax -1 
k. ke 2 k 
U 3 ( N (265 20", Chit) act) (6.6) 
0 


The variance o is an intrinsic noise parameter of the measurement model. Be- 
low we will discuss strategies for setting this parameter. 


2. Unexpected objects. Environments of mobile robots are dynamic, whereas maps 
mare static. As a result, objects not contained in the map can cause range finders 
to produce surprisingly short ranges—at least when compared to the map. A 
typical example of moving objects are people that share the operational space 
of the robot. One way to deal with such objects is to treat them as part of the 
state vector and estimate their location; another, much simpler approach, is to 
treat them as sensor noise. Treated as sensor noise, unmodeled objects have the 
property that they cause ranges to be shorter than z**, not longer. 


More generally, the likelihood of sensing unexpected objects decreases with 
range. To see, imagine there are two people that independently and with the 
same, fixed likelihood show up in the perceptual field of a proximity sensor. One 
person's range is 21, and the second person's range 15 22. Let us further assume 
that zı < z2, without loss of generality. Then we are more likely to measure zı 
than zo. Whenever the first person is present, our sensor measures z1. However, 
for it to measure 25, the second person must be present and the first must be 
absent. 


Mathematically, the probability of range measurements in such situations is 
described by an exponential distribution. The parameter of this distribution, 
Ashort, 18 an intrinsic parameter of the measurement model. According to the 
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definition of an exponential distribution we obtain the following equation for 
Pshort (Z7 | z m): 


k 
А ет Men 0 < zb < К 
Pshort (20 |а) = өөн E (6.7) 
0 otherwise 


As in the previous case, we need a normalizer n since our exponential is limited to 
the interval [0; et Because the cumulative probability in this interval is given 
as 


Кж 


Zt 
= k on Еж = 
| Ashort € Ashort 2; dzF "— Ashort 24 +e Xshort 9 (6.8) 
0 
Кж 
zuo ue e short ži (6.9) 


the value of у can be derived as: 


1 
Heer E EP (6.10) 


k 
1 — Ee rshort 24" 


Figure 6.2b depicts this density graphically. This density falls off exponentially 
with the range z7. 


3. Failures. Sometimes, obstacles are missed altogether. For example, this happens 
frequently with sonar sensors when measuring a surface at a steep angle. Failures 
also occur with laser range finders when sensing black, light-absorbing objects, 
or when measuring objects in bright light. A typical result of sensor failures are 
max-range measurements: the sensor returns its maximum allowable value Zmax. 
Since such events are quite frequent, itis necessary to explicitly model max-range 
measurements in the measurement model. 


We will model this case with a point-mass distribution centered at Zmax: 


Іа 


0 otherwise (6.11) 


Dinas (2k | T m) = I(z = Zmax) = { 


Here J denotes the indicator function that takes on the value 1 if its argument is 
true, and is 0 otherwise. Technically, pmax does not possess a probability density 
function. This is because pmax is a discrete distribution. However, this shall 
not worry us here, as our mathematical model of evaluating the probability of a 
sensor measurement is not affected by the non-existence of a density function. 
(In our diagrams, we simply draw pmax as a very narrow uniform distribution 
centered at Zmax, so that we can pretend a density exists). 
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Figure 6.3 ‘Pseudo-density” of a typical mixture distribution p(z F | z«, m). 


4. Random measurements. Finally, range finders occasionally produce entirely 
unexplained measurements. For example, sonars often generate phantom read- 
ings when they bounce off walls, or when they are subject to cross-talk between 
different sensors. To keep things simple, such measurements will be modeled 
using a uniform distribution spread over the entire sensor measurement range 
[0; Russel: 


1 
Prana (zë |z;,m) = { E 


if0< zE < Zmax 


otherwise (6.12) 


Figure 6.2d shows the density of the distribution Prana- 


These four different distributions are now mixed by a weighted average, defined by 
the parameters Zhit» Zshort» Zmax» and Zrand with Zhit + Zshort + Zmax + Zrand = 1. 


T 
Zhit рп (27 | z,,m) 
k 
k Zshort Dshort (21 | z,,m) 
zi | xm) = . 6.13 
P( t | e ) Zmax eee | z,,m) ( ) 
Zrand Prana (zË | Tt m) 


A typical density resulting from this linear combination of the individual densities is 
shown in Figure 6.3 (with our visualization of the point-mass distribution Pmax as a 
small uniform density). As the reader may notice, the basic characteristics of all four 
basic models are still present in this combined density. 
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1: Algorithm beam range finder model(:;, x;, т): 

2: q=1 

3: for k = 1 to K do 

4: compute z}* for the measurement 2 using ray casting 

5: р = 2nit ` Prit (ZË | £t, m) + Zshort * Pshors (ZË | £e, m) 

6: +2max ` Ртах(2 | £t, M) + Zrand * Prana (Z7 | zs, m) 
T: qQ—4:p 

8: return q 


Table 6.1 Algorithm for computing the likelihood of a range scan 21, assuming condi- 
tional independence between the individual range measurements in the scan. 


The range finder model is implemented by the algorithm beam. range finder model 
in Table 6.1. The input of this algorithm is a complete range scan z+, a robot pose £+, 
and a map m. Its outer loop (Lines 2 and 7) multiplies the likelihood of individual 
sensor beams 27, following Equation (6.2). Line 4 applies ray casting to compute the 
noise-free range for a particular sensor measurement. The likelihood of each individ- 
ual range measurement z7 is computed in Line 5, which implements the mixing rule 
for densities stated in (6.13). After iterating through all sensor measurements zF in z;, 
the algorithm returns the desired probability p(z, | 21, m). 


6.3.2 Adjusting the Intrinsic Model 
Parameters 


In our discussion so far we have not addressed the question of how to choose the vari- 
ous parameters of the sensor model. These parameters include the mixing parameters 
Zhit» Zshort» Zmax; aNd Zana. They also include the parameters съ and Ао. We 
will refer to the set of all intrinsic parameters as O. Clearly, the likelihood of any 
sensor measurement is a function of O. 


One way to determine the intrinsic parameters is to rely on data. Figure 6.4 depicts 
two series of 10,000 measurements obtained with a mobile robot traveling through a 
typical office environment. Both plots show only range measurements for which the 
expected range was approximately 3 meter (between 2.9m and 3.1m). The left plot 
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(a) Sonar data (b) Laser data 
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Figure 6.4 Typical data obtained with (а) a sonar sensor and (b) a laser-range sensor in 
an offi ce environment for a “true” range of 300 cm and a maximum range of 500 cm. 


depicts the data for sonar sensors, and the right plot the corresponding data for laser 
sensors. In both plots, the z-axis shows the number of the reading (from 1 to 10,000), 
and the y-axis is the range measured by the sensor. Whereas most of the measurements 
are close to the correct range for both sensors, the behaviors of the sensors differ 
substantially. The ultrasound sensor appears to suffer from many more measurement 
noise and detection errors. Quite frequently it fails to detect an obstacle, and instead 
reports maximum range. In contrast, the laser range finder is more accurate. However, 
it also occasionally reports false ranges. 


A perfectly acceptable way to set the intrinsic parameters O is by hand: simply eyeball 
the resulting density until it agrees with your experience. Another, more principled 
way, is to learn these parameters from actual data. This is achieved by maximizing 
the likelihood of a reference data set Z = {2} with associated positions X = {x;} 
and map m, where each z; is an actual measurement, x; is the pose at which the 
measurement was taken, and m is the map. The likelihood of the data Z is given by 


р(2 | X, m, ©), (6.14) 


and our goal is to identify intrinsic parameters O that maximize this likelihood. Algo- 
rithms that maximize the likelihood of data are known as maximum likelihood estima- 
tors, or ML estimators in short. 


Table 6.2 depicts the algorithm learn intrinsic parameters, which is an algorithm 
for calculating the maximum likelihood estimate for the intrinsic parameters. As we 
shall see below, the algorithm is an instance of the expectation maximization algo- 
rithm, an iterative procedure for estimating ML parameters. Initially, the algorithm 
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SOOO asd vun 


15: 
16: 


Algorithm learn intrinsic parameters(Z, X, m): 


repeat until convergence criterion satisfied 
for all z; in Z do 
n = | pit (zi | zi, m) + Pshort (Zi | vi, m) 
+ Prax (7 | $i, M) + Prana (2: | vi, m) |- 
calculate 2* 


1 


Ci hit = 7 Pnis(zi | vi, т) 
€i short = 1] Pshort (2i | Ti, m) 
ĉi max = 7] Pmax(Zi | Ti, m) 


ĉi rand = 7] Pranda (ži | Ti, m) 


Zia, = |Z|} X5 eas 
Zshort = К yas Ci short 
Zmax — 12| Xe €i max 
Zrand — |Z|-1 ys; €i rand 
1 
Chit = ET У; ei nit (2i = 27)? 
Em У) ei sont 
Ashort = БЭЙ Еа 24 


return Ө = {Znit, Zshort; “max; тапа, Chit; Ashort } 


Table 6.2 Algorithm for learning the intrinsic parameters of the beam-based sensor model 
from data. 


learn_intrinsic_parameters requires a good initialization of the intrinsic parameters 
Chit and Asnorc. In Lines 3 through 9, it estimates auxiliary variables: Each e; xxx is 
the probability that the measurement 2; is caused by "xxx," where "xxx" is chosen 
from the four aspects of the sensor model, hit, short, max, and random. Subsequently, 
it estimates the intrinsic parameters in Lines 10 through 15. The intrinsic parameters, 
however, are a function of the expectations calculated before. Adjusting the intrinsic 
parameters causes the expectations to change, for which reason the algorithm has to be 
iterated. However, in practice the iteration converges quickly, and a dozen iterations 
are usually sufficient to give good results. 
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(a) Sonar data 
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(b) Laser data 


Figure 6.5 Approximation of the beam model based on (a) sonar data and (b) laser range 
data. The sensor models depicted on the left were obtained by a maximum likelihood 
approximation to the data sets depicted in Figure 6.4. 


Figure 6.5 graphically depicts four examples of data and the ML measurement model 
calculated by learn intrinsic parameters. The first row shows approximations to 
data recorded with the ultrasound sensor. The second row contains plots of two 
functions generated for laser range data. The columns correspond to different "true" 
ranges. The data is organized in histograms. One can clearly see the differences be- 
tween the different graphs. The smaller the range 2Ё* the more accurate the measure- 
ment. For both sensors the Gaussians are narrower for the shorter range than they are 
for the longer measurement. Furthermore, the laser range finder is more accurate than 
the ultrasound sensor, as indicated by the narrower Gaussians and the smaller number 
of maximum range measurements. The other important thing to notice is the relatively 
high likelihood of short and random measurements. This large error likelihood has a 
disadvantage and an advantage: On the negative side, it reduces the information in 
each sensor reading, since the difference in likelihood between a hit and a random 
measurement is small. On the positive side however, this model is less susceptible 
to unmodeled systematic perturbations, such as people who block the robot's path for 
long periods of time. 
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(a) Laser scan and part of the map 


(b) Likelihood for different positions 
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Figure 6.6 Probabilistic model of perception: (a) Laser range scan, projected into a pre- 
viously acquired map m. (b) The likelihood p(z: | £+, m), evaluated for all positions £+ 
and projected into the map (shown in gray). The darker a position, the larger p(zt | xz, m). 


Figure 6.6 illustrates the learned sensor model in action. Shown in Figure 6.6a is a 180 
degree range scan. The robot is placed in a previously acquired occupancy grid map 
at its true pose. Figure 6.6b plots a map of the environment along with the likelihood 
p(zi | £+, m) of this range scan projected into z-y-space (by maximizing over the 
orientation 0). The darker a location, the more likely it is. As is easily seen, all 
regions with high likelihood are located in the corridor. This comes at little surprise, 
as the specific scan is geometrically more consistent with corridor locations than with 
locations inside any of the rooms. The fact that the probability mass is spread out 
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throughout the corridor suggests that a single sensor scan is insufficient to determine 
the robot's exact pose. This is largely due to the symmetry of the corridor. The fact 
that the posterior is organized in two narrow bands is due to the fact that the orientation 
of the robot is unknown: each of these bands corresponds to one of the two surviving 
heading directions of the robot. 


6.3.3 Mathematical Derivation 


To derive the ML estimator, it shall prove useful to introduce auxiliary variables с;, 
the so-called correspondence variable. Each c; can take on one of four values, hit, 
short, max, and random, corresponding to the four possible mechanisms that might 
have produced a measurement 2;. 


Let us first consider the case in which the c;'s are known, that is, we know which of the 
four mechanisms described above caused each measurement z;. Based on the values 
of the c;'s, we can decompose Z into four disjoint sets, Zhit, Zshort: тах ANd аһа, 
which together comprise the set Z. The ML estimators for the intrinsic parameters 
Zhit» Zshort» max» aNd Zranq are simply the normalized ratios: 


Zhit | Zit | 

2, E УЛ 

ae | = || | 619 
Zrand |Zranal 


The remaining intrinsic parameters, Chit and Ashort, are obtained as follows. For the 
data set 2, we get from (6.5) 


P(Znit | X, т, Ө) 


l 
- 
3 
E 
Су 
B 
Е 
aD 


Thit (6.16) 
24 П Fane, Y m 


Here 27 is the “true” range, computed from the pose x; and the map m. A clas- 
sical trick of ML estimation is to maximize the logarithm of the likelihood, instead 
of the likelihood directly. The logarithm is a strictly monotonic function, hence the 
maximum of the log-likelihood is also the maximum of the original likelihood. The 
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log-likelihood is given by 


log p(Znit | X, т, Ө) 


І 


1 1(z; 2) 
У) -51082708 - 20227), (6.17) 


Zi€ Zhit 


which is now easily transformed as follows 


1 2 (2; — aer 
log p(Znit | Х,т,Ө) = (3 2: M MIU MES 
ZiC€ Zhit E 
1 25 — 27 ^ 
= с ||Znit| log 27 + 2|Znit| log onit 4 2, п | | 
2 Chit 
ZiC€ Zhit 


1 
= const. — |Znit| log Chit — 292. 5 (zi — 2)? 
Thit Zi€ Жын 


The derivative of this expression in the intrinsic parameter oy is as follows: 


log p(Znit | X, m, O) |Znit| 1 wD 
= — RE 4—8 
Oonit Chit E ( ) 


= » 
oO. 
hit жєн 


(6.19) 


The maximum of the log-likelihood is now obtained by setting this derivative to zero. 
From that we get the solution to our ML estimation problem. 


1 
Chit = Tr n у (ncm (6.20) 
| ы ZiC€ Zhit 


The estimation of the remaining intrinsic parameter Ashort proceeds just about in the 
same way. The posterior over the data Zshort is given by 


p(Zsnort | X,m, Ө) П Dshort (Zi | Ti, m) 


Zi€ Zshort 


= JL Aine nn (6.21) 


Zi E Zshort 


(6.18) 
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The logarithm is given by 


log D(Zsnort | Х, Th, Ө) = 5 log Ashort — Ashort Zi 
Zi€ Zshort 
=. |Zsnort log Ashort эш Ashort x 24 (6.22) 
Zi€ Zshort 


The first derivative of this expression with respect to the intrinsic parameter Aghort iS 
as follows: 


д1 % ог X, ‚Ө Zs or 
og p( x t | m, 9) = 2 hort | y? РА (6.23) 
short short AC Zins 
Setting this to zero gives us the ML estimate for the intrinsic parameter Asnort 
Zs or 
тнл © (6.24) 
ОРВИ 24 


This derivation assumed knowledge of the parameters с;. We now extend it to the 
case where the c;'s are unknown. As we shall see, the resulting ML estimation prob- 
lem lacks a closed-form solution. However, we can devise a technique that iterates 
two steps, one which calculates an expectation for the c;'s and one that computes the 
intrinsic model parameters under these expectations. The resulting algorithm is an in- 
stance of the expectation maximization algorithm, or EM in short. We will encounter 
EM in later chapters of this book, so this might be a good opportunity for the reader 
to familiarize herself with the basic EM algorithm. 


To derive EM, it will be beneficial to define the likelihood of the data Z first: 


log p(Z | X, m, 9) 
ж 5 log p(zi | zi, m, O) 


2:62 
= У 1оррһи(ж|т„т)+ У) logpaon(z | £i, m) 
Zi€ Zhit Zi€ Zshort 
Е 5 log Pmax (24 | Ti, m) kg 5 log Prana (2; | T) 
Zi€ Zmax Zi € Zrand 


(6.25) 
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This expression can be rewritten using the variables с;: 


logp(Z | X,m,©) = У Де = hit) log pnie(zi | vi, m) 
ZCZ 


HI (ci = short) log Psnort (2i | zi, m) 


-I (ci = max) log pmax(zi | xi, m) 
+I (ci = rand) log prana(zi | zi, m) (6.26) 


where J is the indicator function. Since the values for c; are unknown, it is common 
to integrate them out. Put differently, EM maximizes the expectation E[log p(Z | 
X, m, O)], where the expectation is taken over the unknown variables с;: 


Ellog p(Z | X, m, ©)] 
= Уу plai = hit) log puu(zi | zi, m) + p(ci = short) log Pshort (24 | £i, m) 


+р(с; = max) log paax(2; | xi, m) + p(c; = rand) log prana(z; | £i, m) 


=: Уе log pui(zi | vi, m) + Ei short log penort (2i | i,m) 
i 


TCi, max log Pmax (Zi | £i, m) T €i rand log Prana (Zi | zi, m) (6.27) 


With the definition of the variable e as indicated. This expression is maximized in two 
steps. In a first step, we consider the intrinsic parameters Chit and Ashort given and 
calculate the expectation over the variables c;. 


Ei,hit p(c; = hit) Puit (2i | vi, m) 

€i short ушш plci = short) = n Dshort (Zi | Ti, m) (6 28) 
Ci max plci = max) Pmax(Zi | Ti, m) : 

€i rand plci = rand) Prand (Zi | Ti, m) 


where the normalizer is given by 


n = [рьн(# | £i, M) + Pshort (zi | £i, m) 
+Dmax (Zi | Ti, m) EE Pranda (Zi | Ti, m) [77 (6.29) 


This step is called the “E-step,” indicating that we calculate expectations over the 
latent variables c;. The remaining step is now straightforward, since the expectations 
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decouple the dependencies between the different components of the sensor model. 
First, we note that the ML mixture parameters are simply the normalized expectations 


Zhit €i hit 
Zshort = 5y €i short 
` or = 12| 1 = or (6.30) 
max i i,max 
Zrand €i rand 


The ML parameters Chit and Ashort are then obtained analogously, by replacing the 
hard assignments in (6.20) and (6.24) by soft assignments weighted by the expecta- 
tions. 


1 
Chi = ОНОЈ €i hit (zi — 27)? (6.31) 


and 


2.57 €i short 


ет трете 6.32 
252. ez ĉi,short 21 ( ) 


Ashort 


6.3.4 Practical Considerations 


In practice, computing the densities of all sensor readings can be quite time- 
consuming. For example, laser range scanners often return hundreds of values per 
scan, at a rate of several scans per seconds. Since one has to perform a ray-tracing op- 
eration for each beam of the scan and every possible pose considered, the integration 
of the whole scan into the current belief cannot always be carried out in real-time. One 
typical approach to solve this problem is to incorporate only a small subset of all mea- 
surements (e.g., 8 equally spaced measurements per laser range scan instead of 360). 
This approach has an important additional benefit. Since adjacent beams of a range 
scan are often not independent, the state estimation process becomes less susceptible 
to correlated noise in adjacent measurements by leaving out measurements. 


When dependencies between adjacent measurements are strong, the ML model may 
make the robot overconfident and yield suboptimal results. One simple remedy is 
to replace p(z* | z;,m) by a “weaker” version p(z? | z;, m)? fora < 1. The 
intuition here is to reduce, by a factor of alpha, the information extracted from a sensor 
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measurement (the log of this probability is given by alog p(zF | z;,m)). Another 
possibility—which we will only mention here—is to learn the intrinsic parameters in 
the context of the application: For example, in mobile localization it is possible to 
train the intrinsic parameters via gradient descent to yield good localization results 
over multiple time steps. Such a multi-time step methodology is significantly different 
from the single time step ML estimator described above. In practical implementations 
it can yield superior results. 


The main drain of computing time for beam-based models is the ray casting operation. 
The runtime costs of computing p(z; | £+, m) can be substantially reduced by pre- 
computing the ray casting algorithm, and storing the result in memory—-so that the 
ray casting operation can be replaced by a (much faster) table lookup. An obvious 
implementation of this idea 15 to decompose the state space into a fine-grained three- 
dimensional grid, and to pre-compute the ranges z7* for each grid cell. This idea will 
be further investigated in Chapter 4.1. Depending on the resolution of the grid, the 
memory requirements can be significant. In mobile robot localization, we find that 
pre-computing the range with a grid resolution of 15 centimeters and 2 degrees works 
well for indoor localization problems. It fits well into the RAM for moderate-sized 
computers, yielding speedups by an order of magnitude over the plain implementation 
that casts rays online. 


6.4 LIKELIHOOD FIELDS FOR RANGE 
FINDERS 


6.4.1 Basic Algorithm 


The beam-based sensor model, while closely linked to the geometry and physics of 
range finders, suffers two major drawbacks. 


= Lack of smoothness. In cluttered environments with many small obstacles, the 
distribution p(z} | z;, т) can be very unsmooth in тү. Consider, for example, 
an environment with many chairs and tables (like a typical conference room). A 
robot like the ones shown in Chapter 1 will sense the legs of those obstacles. Ob- 
viously, small changes of a robot's pose x; can have a tremendous impact on the 
correct range of a sensor beam. As a result, the measurement model p(27 | z;, m) 
is highly discontinuous in x; (in particular in the heading direction 0,). Lack of 
smoothness has two problematic consequences. First, any approximate belief 
representation runs danger to miss the correct state, as nearby states might have 
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drastically different posterior likelihoods. This poses constraints on the accuracy 
of the approximation which, if not met, increase the resulting error in the poste- 
rior. Second, hill climbing methods for finding the most likely state are prone to 
local minima, due to the large number of local maxima in such unsmooth models. 


= Computational complexity. Evaluating p(zF | z;,m) for each single sensor 
measurement 2 involves ray casting, which is computationally expensive. As 
noted above, the problem can be partially remedied by pre-computing the ranges 
over a discrete grid in pose space. Such an approach shifts the computation 
into an initial off-line phase, with the benefit that the algorithm is faster at run 
time. However, the resulting tables are very large, since they cover a large three- 
dimensional space. Thus, pre-computing ranges is computationally expensive 
and requires substantial memory. 


We will now describe an alternative model, called likelihood field model, which over- 
comes these limitations. This model lacks a plausible physical explanation. In fact, 
it is an “ad hoc" algorithm that does not necessarily compute a conditional probabil- 
ity relative to any meaningful generative model of the physics of sensors. However, 
the approach works well in practice. In particular, the resulting posteriors are much 
smoother even in cluttered space, and the computation is typically more efficient. 


The key idea is to first project the end points of a sensor scan 2, into the global co- 
ordinate space of the map. To project an individual sensor measurement z7 into the 
global coordinate frame of the map m, we need to know where in global coordinates 
the robot's coordinate system is, where on the robot the sensor beam 2, originates, and 
where it points. As usual let x, = (x у 0)7 denote a robot pose at time t. Keeping 
with our two-dimensional view of the world, we denote the relative location of the 
sensor in the robot's fixed, local coordinate system by (x; ез» лады) з апа Ше ап- 
gular orientation of the sensor beam relative to the robot’s heading direction by Өк sens. 
These values are sensor-specific. The end point of the measurement z7 is now mapped 
into the global coordinate system via the obvious trigonometric transformation. 


T. =- (7), (© 0 —sin0 Zksens \ | jk cos(@ + Өк sens) 
Yzk Е y sin 0 cos 0 Uk,sens 2 sin(0 + Ox sens) 
(6.33) 


These coordinates are only meaningful when the sensor detects an obstacle. If the 
range sensor takes on its maximum value, that is, P = Zmax, these coordinates have 
no meaning in the physical world (even though the measurement does carry informa- 
tion). The likelihood field measurement model simply discards max-range readings. 
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(a) example environment (b) likelihood fi eld 


Figure 6.7 (а) Example environment with three obstacles (gray). The robot is located 
towards the bottom of the fi gure, and takes a measurement E as indicated by the dashed 
line. (b) Likelihood fi eld for this obstacle confi guration: the darker a location, the less 
likely it is to perceive an obstacle there. The probability p(zf | 24, m) for the specific 
sensor beam is shown in Figure ??. 


Similar to the beam model discussed before, we assume three types of sources of noise 
and uncertainty: 


1. Measurement noise. Noise arising from the measurement process is modeled 
using Gaussians. In x-y-space, this involves finding the nearest obstacle in the 
map. Let dist denote the Euclidean distance between the measurement coordi- 
nates (2. Yer) and the nearest object in the map m. Then the probability of 
a sensor measurement is given by a zero-centered Gaussian modeling the sensor 
noise: 


рь(2 | a4,m) = E02, (dist?) (6.34) 
Figure 6.7a depicts a map, and Figure 6.7b shows the corresponding Gaussian 
likelihood for measurement points (х.к y,«)^ in 2-D space. The brighter а 
location, the more likely it is to measure an object with a range finder. The density 
Dni: 15 now obtained by intersecting (and normalizing) the likelihood field by the 
sensor axis, indicated by the dashed line in Figure 6.7. The resulting function is 
the one shown in Figure 6.8a. 
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(a) pnis (2f. | ze, m) (b) р(22 | zt, m) 


01 O2 03 Zmax 01 02 03 Zmax 
Figure 6.8 (а) Probability pit (2%) as а function of the measurement 28) for the situation 
depicted in Figure 6.7. Here the sensor beam passes by three obstacles, with respective 
nearest points 01, 02, and оз. (b) Sensor probability p(zF | 2+, т), obtained for the 
situation depicted in Figure 6.7, obtained by adding two uniform distributions. 


2. Failures. As before, we assume that max-range readings have a distinct large 
likelihood. As before, this is modeled by a point-mass distribution pmax. 


3. Random measurements. Finally, a uniform distribution pang is used to model 
random noise in perception. 


Just as for the beam-based sensor model, the desired probability p(z7 | £+, rn) inte- 
grates all three distributions: 


Zhit ` Dhit + Zrand* Prand + Zmax* Pmax (6.35) 


using the familiar mixing weights 2, Zrand, and Zmax. Figure 6.8b shows an exam- 
ple of the resulting distribution p(z} | z;, m) along a measurement beam. It should 
be easy to see that this distribution combines рь, as shown in Figure 6.8a, and the 
distributions Pmax and Prana. Much of what we said about adjusting the mixing pa- 
rameters transfers over to our new sensor model. They can be adjusted by hand, or 
learned using the ML estimator. A representation like the one in Figure 6.7b, which 
depicts the likelihood of an obstacle detection as a function of global z-y-coordinates, 
is called the likelihood field. 


Table 6.3 provides an algorithm for calculating the measurement probability using the 
likelihood field. The reader should already be familiar with the outer loop, which 
multiplies the individual values of р(2 | z;, т), assuming independence between 
the noise in different sensor beams. Line 4 checks if the sensor reading is a max 
range reading, in which case it is simply ignored. Lines 5 to 8 handle the interesting 
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1: Algorithm likelihood field range finder model(z;, x;, m): 

2: q=1 

3: for all k do 

4: if 2E oue 

5: Tyk = T + Tk sens COS 0 — Yk sens Sin Ө + zE cos(0 + Ok sens) 

6: УЕ = Y + Yk sens COS Ө + Xi sens Sin Ó + 2 sin(0 + Өк sens) 

T: dist? = min [Ga — a)? + (yx — y)? | (x^, y) occupied in m} 
a’,y’ t t 


q=q: E : prob(dist?, o) + аа) 


Zmax 


9: return q 


Table 6.3 Algorithm for computing the likelihood of a range fi nder scan using Euclidean 
distance to the nearest neighbor. The function prob(dist?, cg. 0) computes the probability 


of the distance under a zero-centered Gaussian distribution with variance Si 


case: Here the distance to the nearest obstacle in x-y-space is computed (Line 7), 
and the resulting likelihood is obtained in Line 8 by mixing a normal and a uniform 
distribution (the function prob(dist?, o?,,) computes the probability of dist? under a 
zero-centered Gaussian distribution with variance Gi). 


The most costly operation in algorithm likelihood field range finder model is the 
search for the nearest neighbor in the map (Line 7). To speed up this search, it is 
advantageous to pre-compute the likelihood field, so that calculating the probability of 
a measurement amounts to a coordinate transformation followed by a table lookup. Of 
course, if a discrete grid is used, the result of the lookup is only approximate, in that 
it might return the wrong obstacle coordinates. However, the effect on the probability 
p(z | 21, m) is typically small even for moderately course grids. 


6.4.2 Extensions 


A key advantage of the likelihood field model over the beam-based model discussed 
before is smoothness. Due to the smoothness of the Euclidean distance, small changes 
in the robot's pose x; only have small effects on the resulting distribution p(z% | 


144 CHAPTER 6 


Figure 6.9 (a) Occupancy grid map of the San Jose Tech Museum, (b) pre-processed 
likelihood fi eld. 


ж, m). Another key advantage is that the pre-computation takes place in 2D, instead 
of 3D, increasing the compactness of the pre-computed information. 


However, the current model has three key disadvantages: First, it does not explicitly 
model people and other dynamics that might cause short readings. Second, it treats 
sensors as if they can "see through walls. This is because the ray casting opera- 
tion was replaced by a nearest neighbor function, which is incapable of determining 
whether a path to a point is intercepted by an obstacle in the map. And third, our 
approach does not take map uncertainty into account. In particular, it cannot handle 
unexplored areas, that is, areas for which the map is highly uncertain or unspecified. 


The basic algorithm likelihood field range finder model can be extended to dimin- 
ish the effect of these limitations. For example, one might sort map occupancy values 
into three categories: occupied, free, and unknown, instead of just the first two. When 
а sensor measurement 22 falls into the category unknown, its probability p(z7 | x+, т) 
is assumed to be the constant value = The resulting probabilistic model is that in 


the unexplored space, every sensor measurement is equally likely, hence p(zF | z;, m) 
is modeled by a uniform distribution with density a. Figure 6.9 shows a map and 
the corresponding likelihood field. Here again the gray-level of an x-y-location indi- 
cates the likelihood of receiving a sensor reading there. The reader may notice that the 
distance to the nearest obstacle is only employed inside the map, which corresponds to 
the explored terrain. Outside, the likelihood p(z | z;, m) is a constant. For computa- 
tional efficiency, it is worthwhile to precompute the nearest neighbor for a fine-grained 
2-D grid. 
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(a) sensor scan (b) likelihood fi eld 


eau leniter Meee А, 


Figure 6.10 (a) Sensor scan, from a bird's eye perspective. The robot is placed at the 
bottom of this fi gure, generating a proximity scan that consists of the 180 dots in front of 
the robot. (b) Likelihood function generated from this sensor scan. The darker a region, the 
smaller the likelihood for sensing an object there. Notice that occluded regions are white, 
hence infer no penalty. 


Likelihood fields over the visible space can also be defined for the most recent scan, 
which in fact defines a local map. A straightforward extension is then to match a 
scan and a map symmetrically: In addition to calculating the likelihood of a scan field 
inside the map, a symmetric extension also calculates the likelihood of each (nearby) 
object in the map object relative to the likelihood field of a scan. Such a symmetric 
routine will be of importance in future chapters on mapping, in which we seek to align 
scans with maps. In the interest of brevity, we omit the derivation of the resulting 
algorithm, which in fact is mostly a straightforward extension of the one shown in 
Table 6.3. However, we note that the leading implementations of the likelihood field 
technique rely on the extended symmetric algorithm. 


65 CORRELATION-BASED SENSOR 
MODELS 


There exists a number of range sensor models in the literature that measure correla- 
tions between a measurement and the map. A common technique is known as map 
matching. Map matching requires techniques discussed in later chapters of this book, 
namely the ability to transform scans into occupancy maps. Typically, map matching 
compiles small numbers of consecutive scans into local maps, denoted тоса. Fig- 
ure 6.11 shows such a local map, here in the form of an occupancy grid map. The 
sensor measurement model compares the local map тоса to the global map m, such 
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Figure 6.11 Example of a local map generated from 10 range scans, one of which is 
shown. 


that the more similar m and тоса, the larger р(тиоса | £4, m). Since the local map 
is represented relative to the robot location, this comparison requires that the cells of 
the local map are transformed into the coordinate framework of the global map. Such 
a transformation can be done similar to the coordinate transform (6.33) of sensor mea- 
surements used in the likelihood field model. If the robot is at location 2;, we denote 
by т. у лоса (2+) the grid cell in the local map that corresponds to (x у)” in global 
coordinates. Once both maps are in the same reference frame, they can be compared 
using the map correlation function, which is defined as follows: 


dey May — ТЇ). (may oca (1«) — M) 


Pm,Mioca,te = mm =. (6.36) 
Vf dey (Mey — m) ey (Me,y local (Zt) = т) 


Here the sum is evaluated over cells defined in both maps, and mn is the average тар 
value: 


1 
= 2N (mz, + Tha, y local); (6.37) 
2,у 


where N denotes the number of elements in the overlap between the local and global 
map. The correlation рњ, scales between +1. Map matching interprets the 
value 


p(miocal | Tt, m) = MAX Онега Й 0} (6.38) 
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as the probability of the local map conditioned on the global map m and the robot 
pose x+. If the local map is generated from a single range scan z+, this probability 
substitutes the measurement probability p(z; | x+, m). 


Map matching has a number of nice properties: just like the likelihood field model, it is 
easy to compute, though it does not yield smooth probabilities in the pose parameter 
т. One way to approximate the likelihood field (and to obtain smoothness) is to 
convolve the map m with a Gaussian smoothness kernel, and to run map matching on 
this smoothed map. 


A key advantage of map matching over likelihood fields is that it explicitly considers 
the free-space in the scoring of two maps; the likelihood field technique only considers 
the end point of the scans, which by definition correspond to occupied space (or noise). 
On the other hand, many mapping techniques build local maps beyond the reach of the 
sensors. For example, many techniques build circular maps around the robot, setting 
to 0.5 areas beyond the range of actual sensor measurements. In such cases, there is 
a danger that the result of map matching incorporates areas beyond the actual mea- 
surement range, as if the sensor can “see through walls.” Such side-effects are found 
in a number of implemented map matching techniques. А further disadvantage is that 
map matching does not possess a plausible physical explanation. Correlations are the 
normalized quadratic distance between maps, which is not the noise characteristic of 
range sensors. 


6.66 FEATURE-BASED SENSOR MODELS 


6.6.1 Feature Extraction 


The sensor models discussed thus far are all based on raw sensor measurements. An 
alternative approach is to extract features from the measurements. If we denote the 
feature extractor as a function f, the features extracted from a range measurement are 
given by f(z,). Most feature extractors extract a small number of features from high- 
dimensional sensor measurements. A key advantage of this approach is the enormous 
reduction of computational complexity: While inference in the high-dimensional mea- 
surement space can be costly, inference in the low-dimensional feature space can be 
orders of magnitude more efficient. 


The discussion of specific algorithms for feature extraction is beyond the scope of 
this book. The literature offers a wide range of features for a number of different 
sensors. For range sensors, it is common to identify lines, corners, or local minima 
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in range scans, which correspond to walls, corners, or objects such as tree trunks. 
When cameras are used for navigation, the processing of camera images falls into the 
realm of computer vision. Computer vision has devised a myriad of feature extraction 
techniques from camera images. Popular features include edges, distinct patterns, 
and objects of distinct appearance. In robotics, it is also common to define places as 
features, such as hallways and intersections. 


6.6.2 Landmark Measurements 


In many robotics applications, features correspond to distinct objects in the physical 
world. For example, in indoor environments features may be door posts or window 
stills; outdoors they may correspond to tree trunks or corners of buildings. In robotics, 
it is common to call those physical objects landmarks, to indicate that they are being 
used for robot navigation. 


The most common model for processing landmarks assumes that the sensor can mea- 
sure the range and the bearing of the landmark relative to the robot's local coordinate 
frame. This is not an implausible assumption: Any local feature extracted from range 
scans come with range and bearing information, as do visual features detected by 
stereo vision. In addition, the feature extractor may generate a signature. In this book, 
we assume a signature is a numerical value (e.g., an average color); it may equally be 
an integer that characterizes the type of the observed landmark, or a multi-dimensional 
vector characterizing a landmark (e.g., height and color). 


If we denote the range by r, the bearing by Фф, and the signature by s, the feature vector 
is given by a collection of triplets 


rl r2 
Fed = {4 ..} = {| 4 |.| a || get (6.39) 
5i 52 


The number of features identified at each time step is variable. However, many prob- 
abilistic robotic algorithms assume conditional independence between features, that 
15, 


(f(z) | arm) = [piri si] enm) (6.40) 
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Conditional independence applies if the noise in each individual measurement 
(ri $i si)" is independent of the noise in other measurements (r? ¢; sj)! (for 
i Æ j). Under the conditional independence assumption, we can process one feature 
at-a-time, just as we did in several of our range measurement models. This makes it 
much easier to develop algorithms that implement probabilistic measurement models. 


Let us now devise a sensor model for features. In the beginning of this chapter, we dis- 
tinguished between two types of maps: feature-based and location-based. Landmark 
measurement models are usually defined only for feature-based maps. The reader may 
recall that those maps consist of list of features, m = (m4, mz2,...}. Each feature may 
possess a signature and a location coordinate The location of a feature, denoted т; „, 
and M; y, is simply its coordinate in the global coordinate frame of the map. 


The measurement vector for a noise-free landmark sensor is easily specified by the 
standard geometric laws. We will model noise in landmark perception by independent 
Gaussian noise on the range, bearing, an the signature. The resulting measurement 
model is formulated for the case where the 2-th feature at time t corresponds to the 
j-th landmark in the map. As usual, the robot pose is given by x; = (x y 0)T. 


ri V (mj; — £)? + (ms, — y)? Eo? 
| = atan2(m;, — Y, Mjæ —2)—0 | + Eo? 
si Sj Eq? 


(6.41) 


Here €,2, £52 , and €,2 are zero-mean Gaussian error variables with variances c2, с, 
E а 


and c2, respectively. 


6.6.3 Sensor Model With Known 
Correspondence 


To implement this measurement model, we need to define a variable that establishes 
correspondence between the feature fý and the landmark m; in the map. This variable 
will be denoted by ci with сї € {1,..., N + 1}; N is the number of landmarks in the 
map m. If сї = j < №, then the i-th feature observed at time t corresponds to the 
j-th landmark in the map. In other words, сї is the true identity of an observed feature. 
The only exception occurs with сї = N + 1: Here a feature observation does not 
correspond to any feature in the map m. This case is important for handling spurious 
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k Algorithm landmark model known. correspondence( f/, сї, £+, m): 
2: Jus 

3: ? = (mja — T)? + (Mjy — y)? 

4: ф= atan2(m;, — y, mj, — c) 

5: q = prob(ri — ?, 02) - prob(¢i — à, 05): prob(s} — sj, o2) 

6: return q 


Table 6.4 Algorithm for computing the likelihood of a landmark measurement. The al- 
gorithm requires as input an observed feature fi = (ri фі Sie, and the true identity of 
the feature сї, the robot pose 2; = (2 y 6)", and the map m. It's output is the numerical 
probability p( f? | ci, m, xt). 


landmarks; it is also of great relevance for the topic of robotic mapping, in which the 
robot regularly encounters previously unobserved landmarks. 


Table 6.4 depicts the algorithm for calculating the probability of a feature fë with 
known correspondence сї < N. Lines 3 and 4 calculate the true range and bearing to 
the landmark. The probability of the measured ranges and bearing is then calculated 
in Line 5, assuming independence in the noise. As the reader easily verifies, this 
algorithm implements Equation (6.41). 


6.6.4 Sampling Poses 


Sometimes it is desirable to sample robot poses х; that correspond to a measurement 
fi with feature identity сї. We already encountered such sampling algorithms in the 
previous chapter, where we discussed robot motion models. Such sampling models 
are also desirable for sensor models. For example, when localizing a robot globally, it 
shall become useful to generate sample poses that incorporate a sensor measurement 
to generate initial guesses for the robot pose. 


While in the general case, sampling poses z; that correspond to a sensor measurement 
2 1s difficult, for our landmark model we can actually provide an efficient sampling 
algorithm. However, such sampling is only possible under further assumptions. In 
particular, we have to know the prior p(z; | ci, т). For simplicity, let us assume this 
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k Algorithm sample landmark model known. correspondence( f}, сї, m): 
2: j exe 

3: ^ — rand(0, 27) 

4: f = ri + sample(c2) 

5: b= фі + sample(oc;) 

6: ж = Mj fcos^ 

7: Y = Mjy Б? іп Ау 

8: 0=4-т-– ф 

9: return (x у 0)" 


Table 6.5 Algorithm for sampling poses from a landmark measurement fi = 
(ri $i st)? with known identity сї. 


prior is uniform (it generally is not!). Bayes rule then suggests that 


p(x | fi cim) = n »(fi | cz, m) plz: | cj, m) 
n Cft | ci, zi, m) (6.42) 


Sampling from p(x; | ft, ci, m) can now be achieved from the “inverse” of the sen- 
sor model p(f? | ci,z,,m). Table 6.5 depicts an algorithm that samples poses тү. 
The algorithm is tricky: Even in the noise-free case, a landmark observation does not 
uniquely determine the location of the robot. Instead, the robot may be on a circle 
around the landmark, whose diameter is the range to the landmark. The indetermi- 
nacy of the robot pose also follows from the fact that the range and bearing provide 
two constraints in a three-dimensional space of robot poses. 


To implement a pose sampler, we have to sample the remaining free parameter, which 
determines where on the circle around the landmark the robot is located. This parame- 
ter is called 4 in Table 6.5, and is chosen at random in Line 3. Lines 4 and 5 perturb the 
measured range and bearing, exploiting the fact that the mean and the measurement 
are treated symmetrically in Gaussians. Finally, Lines 6 through 8 recover the pose 
that corresponds to ^, 7, and d. 


Figure 6.12 illustrates the pose distribution р(х, | ff, c, т) (left diagram) and also 
shows a sample 
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Figure 6.12 Landmark detection model: (a) Posterior distribution of the robot’s pose 
given that it detected a landmark in 5m distance and 30deg relative bearing (projected onto 
2D). (b) Sample robot poses generated from such a detection. The lines indicate the orien- 
tation of the poses. 


drawn with our algorithm sample landmark model known. correspondence (right 
diagram). The posterior is projected into z-y-space, where it becomes a ring around 
the measured range тї. In 3-D pose space, it is a spiral that unfolds the ring with the 
angle 0. 


6.6.5 Further Considerations 


Both of our algorithms for landmark-based measurements assume known correspon- 
dence. The case of unknown correspondence will be discussed in detail below, when 
we address algorithms for localization and mapping under unknown correspondence. 


We also introduced a signature value for landmarks. Most published algorithms do not 
make the use of apearance features explicit. When the signature is not provided, all 
landmarks look equal, and the data association problem of estimating the correspon- 
dence variables is even harder. We have included the signature in our model because 
it is a valuable source of information that can often be easily extracted form the sensor 
measurements. 


As noted above, the main motivation for using features instead of the full measurement 
vector is computational in nature: It is much easier to manage a few hundred features 
than a few billion range measurements. Our model presented here is extremely crude, 
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and it clearly does not capture the physical laws that underly the feature formation 
process. Nevertheless, the model tends to work well in a great number of applications. 


It is important to notice that the reduction of measurements into features comes at a 
price. In the robotics literature, features are often (mis-)taken for a sufficient statistic 
of the measurement vector z;, that is 


p(f(z)| vum) ~ p(z |2 m) (6.43) 


In practice, however, a lot of information is sacrificed by using features instead of the 
full measurement vector. This lost information makes certain problems more difficult, 
such as the data association problem of determining whether or not the robot just re- 
visited a previously explored location. It is easy to understand the effects of feature 
extraction by introspection: When you open your eyes, the visual image of your envi- 
ronment is probably sufficient to tell you unambiguously where you are—even if you 
were globally uncertain before. If, on the other hand, you only sense certain features, 
such as the relative location of door posts and window stills, you would probably be 
much less certain as to where you are. Quite likely the information may be insufficient 
for global localization. 


With the advent of fast computers, features have gradually lost importance in the field 
of robotics. Especially when using range sensors, most state-of-the-art algorithms rely 
on dense measurement vectors, and they use dense location-based maps to represent 
the environment. Nevertheless, features are of great importance. They enable us to 
introduce the basic concepts in probabilistic robotics, and with proper treatement of 
problems such as the correspondence problem they can be brought to bear even in 
cases where maps are cmposes of dense sets of scan points. For this reason, a num- 
ber of algorithms in this book are first described for feature representations, and then 
extended into algorithms using raw sensor measurements. 


6.7 PRACTICAL CONSIDERATIONS 


This section surveyed a range of measurement models. We placed a strong emphasis 
on models for range finders, due to their great importance in robotics. However, the 
models discussed here are only representatives of a much broader class of probabilistic 
models. In choosing the right model, it is important to trade off physical realism with 
properties that might be desirable for an algorithm using these models. For example, 
we noted that a physically realistic model of range sensors may yield probabilities 
that are not smooth in the alleged robot pose—which in turn causes problems for 
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algorithms such as particle filters. Physical realism is therefore not the only criterion 
in choosing the right sensor model; an equally important criterion is the goodness of a 
model for the algorithm that utilizes it. 


As a general rule of thumb, the more accurate a model, the better. In particular, the 
more information we can extract from a sensor measurement, the better. Feature-based 
models extract relatively little information, by virtue of the fact that feature extractors 
project high-dimensional sensor measurements into lower dimensional space. As a 
result, feature-based methods tend to produce inferior results. This disadvantage is 
offset by superior computational properties of feature-based representations. 


When adjusting the intrinsic parameters of a measurement model, it is often useful 
to artificially inflate the uncertainty. This is because of a key limitation of the prob- 
abilistic approach: To make probabilistic techniques computationally tractable, we 
have to ignore dependencies that exist in the physical world, along with a myriad of 
latent variables that cause these dependencies. When such dependencies are not mod- 
eled, algorithms that integrate evidence from multiple measurements quickly become 
overconfident. Such overconfidence can ultimately lead to wrong conclusions, which 
negatively affects the results. In practice, it is therefore a good rule of thumb to reduce 
the information conveyed by a sensor. Doing so by projecting the measurement into 
a low-dimensional feature space is one way of achieving this. However, it suffers the 
limitations mentioned above. Uniformly decaying the information by exponentiating a 
measurement model with a parameter a, as discussed in Section 6.3.4, is a much better 
way, in that it does not introduce additional variance in the outcome of a probabilistic 
algorithm. 


68 SUMMARY 


This section described probabilistic measurement models. 


m Starting with models for range finders—and lasers in particular—we also dis- 
cussed measurement models p(z7 | z;, n). The first such model used ray casting 
to determine the shape of p(z* | £+, m) for particular maps m and poses тү. We 
devised a mixture model that addressed the various types of noise that can affect 
range measurements. 


m We devised a maximum likelihood technique for identifying the intrinsic noise 
parameters of the measurement model. Since the measurement model is a mix- 
ture model, we provided an iterative procedure for maximum likelihood estima- 
tion. Our approach was an instance of the expectation maximization algorithm, 
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which alternates a phase that calculates expectations over the type of error un- 
derlying a measurement, with a maximization phase that finds in closed form the 
best set of intrinsic parameters relative to these expectations. 


m  Analternative measurement model for range finders is based on likelihood fields. 
This technique used the nearest distance in 2-D coordinates to model the prob- 
ability p(zF | z,, m). We noted that this approach tends to yield smoother dis- 
tributions p(zF | z;,m). This comes at the expense of undesired side effects: 
The likelihood field technique ignores information pertaining to free-space, and 
it fails to consider occlusions in the interpretation of range measurements. 


m А third measurement model is based оп map matching. Map matching maps 
sensor scans into local maps, and correlates those maps with global maps. This 
approach lacks a physical motivation, but can be implemented very efficiently. 


m We discussed how pre-computation can reduce the computational burden at run- 
time. In the beam-based measurement model, the pre-computation takes place in 
3-D; the likelihood field requires only a 2-D pre-computation. 


m We presented a feature-based sensor model, in which the robot extracts the range, 
bearing, and signature of nearby landmarks. Feature-based techniques extract 
from the raw sensor measurement distinct features. In doing so, they reduce the 
dimensionality of the sensor measurement by several orders of magnitude. 


m At the end of the chapter, a discussion on practical issues pointed out some of the 
pitfalls that may arise in concrete implementations. 
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MOBILE ROBOT LOCALIZATION 


7.1 INTRODUCTION 


This chapter presents a number of probabilistic algorithms for mobile robot local- 
ization. Mobile robot localization is the problem of determining the pose of a robot 
relative to a given map of the environment. It is often called position estimation or 
position tracking. Mobile robot localization is an instance of the general localiza- 
tion problem, which is the most basic perceptual problem in robotics. This is because 
nearly all robotics tasks require knowledge of the location of the robots and the objects 
that are being manipulated (although not necessarily within a global map). 


Localization can be seen as a problem of coordinate transformation. Maps are de- 
scribed in a global coordinate system, which is independent of a robot's pose. Lo- 
calization is the process of establishing correspondence between the map coordinate 
system and the robot's local coordinate system. Knowing this coordinate transforma- 
tion enables the robot to express the location of objects of interests within its own 
coordinate frame—a necessary prerequisite for robot navigation. As the reader easily 
verifies, knowing the pose x; = (x y 0)7 of the robot is sufficient to determine this 
coordinate transformation, assuming that the pose is expressed in the same coordinate 
frame as the map. 


Unfortunately—and herein lies the problem of mobile robot localization—the pose 
can usually not be sensed directly. Put differently, most robots do not possess a (noise- 
free!) sensor for measuring pose. The pose has therefore to be inferred from data. A 
key difficulty arises from the fact that a single sensor measurement is usually insuf- 
ficient to determine the pose. Instead, the robot has to integrate data over time to 
determine its pose. To see why this is necessary, just picture a robot located inside a 
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Figure 7.1 Example maps used for robot localization: a 2D metric layout, a graph-like 
topological map, and an image mosaic of a ceiling 


building where many corridors look alike. Here a single sensor measurement (e.g., a 
range scan) is usually insufficient to disambiguate the identity of the corridor. 


Localization has been applied in conjunction with a broad set of map representations. 
We already discussed two types of maps in the previous chapter: feature-based and 
location-based. An example of the latter were occupancy grid maps, which were in- 
formally discussed and are subject to a later chapter in this book. Instances of such 
maps are shown in Figure 7.1. This figure shows a hand-drawn metric 2-D map, a 
graph-like topological map, and an image mosaic of a ceiling (which can also be used 
as a map). The space of map representations used in today's research is huge. A 
number of later chapters will investigate specific map types and discuss algorithms for 
acquiring maps from data. Localization assumes that an accurate map is available. 


In this and the subsequent chapter, we will present some basic probabilistic algorithms 
for mobile localization. АП of these algorithms are variants of the basic Bayes filter 
described in Chapter 2. We will discuss the advantages and shortcomings of each 
representation and associated algorithm. The chapter also goes through a series of ex- 
tensions that address different localization problems, as defined through the following 
taxonomy of robot localization problems. 


7.22 A TAXONOMY OF LOCALIZATION 
PROBLEMS 


Not every localization problem is equally hard. To understand the difficulty of a local- 
ization problem, we will now discuss a brief taxonomy of localization problems. This 
taxonomy will divide localization problems along a number of important dimensions 
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pertaining to the nature of the environment and the initial knowledge that a robot may 
possess relative to the localization problem. 


Local Versus Global Localization 


Localization problems are characterized by the type of knowledge that is available 
initially and at run-time. We distinguish three types of localization problems with an 
increasing degree of difficulty. 


= Position tracking. Position tracking assumes that the initial robot pose is known. 
Localizing the robot can be achieved by accommodating the noise in robot mo- 
ton. The effect of such noise is usually small. Hence, methods for position 
tracking often rely on the assumption that the pose error is small. The pose un- 
certainty is often approximated by a unimodal distribution (e.g., a Gaussian). The 
position tracking problem is a /ocal problem, since the uncertainty is local and 
confined to region near the robot's true pose. 


= Global localization. Here the initial pose of the robot is unknown. The robot is 
initially placed somewhere in its environment, but it lacks knowledge of where 
itis. Approaches to global localization cannot assume boundedness of the pose 
error. As we shall see later in this chapter, unimodal probability distributions are 
usually inappropriate. Global localization is more difficult than position tracking; 
in fact, it subsumes the position tracking problem. 


= Kidnapped robot problem. This problem is a variant of the global localization 
problem, but one that is even more difficult. During operation, the robot can get 
kidnapped and teleported to some other location. The kidnapped robot problem 
is more difficult than the global localization problem, in that the robot might 
believe it knows where it is while it does not. In global localization, there robots 
knows that it doesn't know where it is. One might argue that robots are rarely 
kidnapped in practice. The practical importance of this problem, however, arises 
from the observation that most state-of-the-art localization algorithms cannot be 
guaranteed never to fail. The ability to recover from failures is essential for truly 
autonomous robots. Testing a localization algorithm by kidnapping it measures 
its ability to recover from global localization failures. 


Static Versus Dynamic Environments 


A second dimension that has a substantial impact on the difficulty of localization is 
the environment. Environments can be static or dynamic. 
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m  Staticenvironments. Static environments are environments where the only vari- 
able quantity (state) is the robot's pose. Put differently, only the robot moves in 
static environment. АП other objects in the environments remain at the same lo- 
cation forever. Static environments have some nice mathematical properties that 
make them amenable to efficient probabilistic estimation. 


= Dynamic environments. Dynamic environments possess objects other than the 
robot whose location or configuration changes over time. Of particular interest 
are changes that persist over time, and that impact more than a single sensor 
reading. Changes that are not measurable are of course of no relevance to local- 
ization, and those that affect only a single measurement are best treated as noise 
(cf. Chapter 2.4.4). Examples of more persistent changes are: people, daylight 
(for robots equipped with cameras), movable furniture, or doors. Clearly, most 
real environment are dynamic, with state changes occurring at a range of different 
speeds. 


Obviously, localization in dynamic environments is more difficult than localization in 
static ones. There are two principal approaches for accommodating dynamics: First, 
dynamic entities might be included in the state vector. As a result, the Markov as- 
sumption might now be justified, but such an approach carries the burden of additional 
computational and modeling complexity; in fact, the resulting algorithm becomes ef- 
fectively a mapping algorithm. Second, in certain situations sensor data can be filtered 
so as to eliminate the damaging effect of unmodeled dynamics. Such an approach will 
be described further below in Section 8.4. 


Passive Versus Active Approaches 


A third dimension that characterizes different localization problems pertains to the 
fact whether or not the localization algorithm controls the motion of the robot. We 
distinguish two cases: 


m Passive localization. In passive approaches, the localization module only ob- 
serves the robot operating. The robot is controlled through some other means, 
and the robot's motion is not aimed at facilitating localization. For example, the 
robot might move randomly or perform its everyday's tasks. 


= Active localization. Active localization algorithms control the robot so as to 
minimize the localization error and/or the costs arising from moving a poorly 
localized robot into a hazardous place. 
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Figure 7.2 Example situation that shows a typical belief state during global localization in 
a locally symmetric environment. The robot has to move into one of the rooms to determine 
its location. 


Active approaches to localization typically yield better localization results than passive 
ones. We already discussed an examples in the introduction to this book: coastal 
navigation. A second example situation is shown in Figure 7.2. Here the robot is 
located in a symmetric corridor, and its belief after navigating the corridor for a while 
is centered at two (symmetric) poses. The local symmetry of the environment makes 
it impossible to localize the robot while in the corridor. Only if it moves into a room 
will it be able to eliminate the ambiguity and to determine its pose. It is situations like 
these where active localization gives much better results: Instead of merely waiting 
until the robot incidentally moves into a room, active localization can recognize the 
impasse and send it there directly. 


However, a key limitation of active approaches is that they require control over the 
robot. Thus, in practice, an active localization technique alone tends to be insuffi- 
cient: The robot has to be able to localize itself even when carrying out some other 
task than localization. Some active localization techniques are built on top of a pas- 
sive technique. Others combine tasks performance goals with localization goals when 
controlling a robot. 
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This chapter exclusively considers passive localization algorithms. We will return 
to the issue of active localization in Chapter ?? of this book, where we will present 
probabilistic algorithms for robot control. 


Single-Robot Versus Multi-Robot 


A fourth dimension of the localization problem is related to the number of robots 
involved. 


=  Single-robot localization. The most commonly studied approach to localization 
deals with a single robot only. Single robot localization offers the convenience 
that all data is collected at a single robot platform, and there is no communication 
issue. 


=  Multi-robot localization. The localization problem naturally arises to teams of 
robots. At first glance, each robot could localize itself individually, hence the 
multi-robot localization problem can be solved through single-robot localization. 
If robots are able to detect each other, however, there is the opportunity to do 
better. This is because one robot's belief can be used to bias another robot's belief 
if knowledge of the relative location of both robots is available. The issue of 
multi-robot localization raises interesting, non-trivial issues on the representation 
of beliefs and the nature of the communication between them. 


These four dimensions capture the four most important characteristics of the mobile 
robot localization problem. There exist a number of other characterizations that impact 
the hardness of the problem, such as the information provided by robot measurements 
and the information lost through motion. Also, symmetric environments are more 
difficult than asymmetric ones, due to the higher degree of ambiguity. We will now 
look at specific algorithms and discuss their applicability to the different localization 
problems as defined thus far. 


7.3 MARKOV LOCALIZATION 


Probabilistic localization algorithms are variants of the Bayes filter. The straightfor- 
ward application of Bayes filters to the localization problem is called Markov lo- 
calization. Table 7.1 depicts the basic algorithm. This algorithm is derived from 
the algorithm Bayes filter (Table 2.1 on page 24). Notice that Markov localization 
also requires a map m as input. The map plays a role in the measurement model 
p(z | xı, m) (Line 4). It often, but not always, is incorporated in the motion model 
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Algorithm Markov localization(bel(z; 1), uz, 22, т): 
for all x, do 
bel(zi) = J р(ж | Ue, ia, m) bel(zi 1) dx 
bel(z,) = n p(z | zc, m) bel(ax) 
endfor 


return bel(z,) 


Table 7.1 Markov localization. 


p(zi | ui, 21-1, m) as well (Line 3). Just like the Bayes filter, Markov localization 
transforms a probabilistic belief at time t — 1 into a belief at time t. Markov localiza- 
tion addresses the global localization problem, the position tracking problem, and the 
kidnapped robot problem in static environments. 


The initial belief, bel(xo), reflects the initial knowledge of the robot's pose. It is set 
differently depending on the type of localization problem. 


Position tracking. If the initial pose is known, bel(xo) is initialized by a point- 
mass distribution. Let z denote the (known) initial pose. Then 


M" 1 if 20 = Zo 
бека) = { 0 otherwise CD 


Point-mass distributions are discrete and therefore do not possess a density. 


In practice the initial pose is often just known in approximation. The belief 
bel(xo) is then usually initialized by a narrow Gaussian distribution centered 
around zo. Gaussians were defined in Equation (2.4) on page 11. 


bel(zg) = det (223-3 exp (—i(zo — Zo) 3 (zo — Zo)} 
e AN (zo; o, X) (7.2) 


X is the covariance of the initial pose uncertainty. 


Global localization. If the initial pose is unknown, bel(zo) is initialized by a 
uniform distribution over the space of all legal poses in the map: 


1 


bel(xo) EXT 


(7.3) 
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Figure 7.3 Example environment used to illustrate mobile robot localization: One- 
dimensional hallway environment with three indistinguishable doors. Initially the robot 
does not know its location except for its heading direction. Its goal is to fi nd out where it is. 


where | X | stands for the volume (Lebesgue measure) of the space of all poses 
within the map. 


= Other. Partial knowledge of the robot's position can usually easily be trans- 
formed into an appropriate initial distribution. For example, if the robot is known 
to start next to a door, one might initialize bel(zo) using a density that is zero ex- 
cept for places near doors, where it may be uniform. If it is known to be located 
in a specific corridor, one might initialize bel(zo) by a uniform distribution in the 
area of the corridor and zero anywhere else. 


7.4 ILLUSTRATION OF MARKOV 
LOCALIZATION 


We have already discussed Markov localization in the introduction to this book, as 
a motivating example for probabilistic robotics. Now we can back up this example 
using a concrete mathematical framework. Figure 7.3 depicts our one-dimensional 
hallway with three identically looking doors. The initial belief bel(ao) is uniform over 
all poses, as illustrated by the uniform density in Figure 7.4a. As the robot queries its 
sensors and notices that it is adjacent to one of the doors, it multiplies its belief bel(xo) 
by p(z | 22, т), as stated in Line 4 of our algorithm. The upper density in Figure 7.4b 
visualizes p(z; | £+, m) for the hallway example. The lower density is the result of 
multiplying this density into the robot's uniform prior belief. Again, the resulting 
belief is multi-modal, reflecting the residual uncertainty of the robot at this point. 


As the robot moves to the right, indicated in Figure 7.4c, Line 3 of the Markov lo- 
cations algorithm convolves its belief with the motion model p(x, | u;,2; 1). The 
motion model p(x; | u+, £+—1) is not focused on a single pose but on a whole con- 
tinuum of poses centered around the expected outcome of a noise-free motion. The 
effect is visualized in Figure 7.4c, which shows a shifted belief that is also flattened 
out, as a result of the convolution. 
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Figure 7.4 Illustration of the Markov localization algorithm. Each picture depicts the 
position of the robot in the hallway and its current belief bel(x). (b) and (d) additionally 


depict the observation model p(z¢ | 2+), which describes the probability of observing a 
door at the different locations in the hallway. 
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The final measurement is illustrated in Figure 7.4d. Here the Markov localization 
algorithm multiplies the current belief with the perceptual probability p(z; | т). At 
this point, most of the probability mass is focused on the correct pose, and the robot is 
quite confident of having localized itself. Figure 7.4e illustrates the robot's belief after 
having moved further down the hallway. 


We already noted that Markov localization is independent of the underlying represen- 
tation of the state space. In fact, Markov localization can be implemented using any 
of the representations discussed in Chapter 2. We will now consider three different 
representations and devise practical algorithms that can localize mobile robots in real 
time. We begin with Kalman filters, which represent beliefs by their first and second 
moment. We then continue with discrete, grid representations and finally introduce 
algorithms using particle filters. 


7.5 EKF LOCALIZATION 


The extended Kalman filter localization algorithm, or EKF localization, is a special 
case of Markov localization. EKF localization represent beliefs bel(z,) by their their 
first and second moment, that is, the mean u+ and the covariance X+. The basic ЕКЕ 
algorithm was stated in Table 3.3 in Chapter 3.3. EKF localization shall be our first 
concrete implementation of an EKF in the context of an actual robotics problem. 


Our EKF localization algorithm assumes that the map is represented by a collection 
of features. Thus, at any point in time t, the robot gets to observe a vector of ranges 
and bearings to nearby features: z; = {21, 22,...}. We begin with a localization 
algorithm in which all features are uniquely identifiable. The existence of uniquely 
identifiable features may not be a bad assumption: For example, the Eiffel Tour in 
Paris is a landmark that is rarely confused with other landmarks, апа it is widely visible 
throughout Paris. The identity of a feature is expressed by set of correspondence 
variables, denoted ci, one for each feature vector zi. Correspondence variables were 
already discussed in Chapter 6.6. In our first algorithm, the correspondence is assumed 
to be known. We then progress to a more general version which allows for ambiguity 
among features. Instead of the correspondence, the robot observes a feature signature, 
si. The second, more general version applied a maximum likelihood estimator to 
estimate the value of the latent correspondence variable, and uses the result of this 
estimation as ground truth. 
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Figure 7.5 Application of the Kalman filter algorithm to mobile robot localization. АП 
densities are represented by uni-modal Gaussians. 


7.5.1 Illustration 


Figure 7.5 illustrates the EKF localization algorithm using our example of mobile 
robot localization in the one-dimensional corridor environment (cf. Figure 7.3). To 
accommodate the unimodal shape of the belief in EKFs, we make two convenient 
assumptions: First, we assume that the correspondences are known: we will attach 
unique labels to each door (1, 2, and 3), and we will denote the measurement model 
by р( | zi, c+) where c; € {1,2,3} is the identity of the door observed at time 
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t. Second, we assume that the initial pose is relatively well known. A typical initial 
belief is represented by the Gaussian distribution shown in Figure 7.5a, centered on 
the area near Door 1 and with a Gaussian uncertainty as indicated in that figure. As 
the robot moves to the right, its belief is convolved with the Gaussian motion model. 
The resulting belief is a shifted Gaussian of increased width, as shown in Figure 7.5b. 


Now suppose the robot detects that it is in front of door c; — 2. The upper density in 
Figure 7.5c visualizes p(z; | £+, m, c+) for this observation—again a Gaussian. Fold- 
ing this measurement probability into the robot's belief yields the posterior shown in 
Figure 7.5c. Note that the variance of the resulting belief is smaller than the variances 
of both the robot's previous belief and the observation density. This is natural, since 
integrating two independent estimates should make the robot more certain than each 
estimate in isolation. After moving down the hallway, the robot's uncertainty in its 
position increases again, since the EKF continues to incorporate motion uncertainty 
into the robot's belief. Figure 7.5d shows one of these beliefs. This example illustrates 
the EKF in our limited setting. 


7.5.2 The ЕКЕ Localization Algorithm 


The discussion thus far has been fairly abstract: We have silently assumed the avail- 
ability of an appropriate motion and measurement model, and have left unspecified 
a number of key variables in the EKF update. We will now discuss a concrete im- 
plementation of the EKF, for feature-based maps. Our feature-based maps consist 
of point landmarks, as already discussed in Chapter 6.2. For such point landmarks, 
we will use the common measurement model discussed in Chapter 6.6. We will also 
adopt the velocity motion model defined in Chapter 5.3. The reader may take a mo- 
ment to briefly reacquire the basic measurement and motion equations discussed in 
these chapters before reading on. 


Table 7.2 describes EKF localization known. correspondences, the EKF algorithm 
for localization with known correspondences. This algorithm is derived from the EKF 
in Table 3.3 in Chapter 3. It requires as its input a Gaussian estimate of the robot pose 
at time t — 1, with mean 444, and covariance X, ,. Further, it requires a control 
Ut, а map m, and a set of features 2; = (26, 225 ...] measured at time t, along with 
the correspondence variables c; = {c}, с?,...}. It output is a new, revised estimate 
Ht; Y. 


The individual calculations in this algorithm will be explained further below. Lines 
2 through 4 implement the familiar motion update, using a linearized motion model. 
The predicted pose after motion is calculated as р; in Line 2, and Line 4 computes the 
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1: Algorithm EKF localization known. correspondences(,;..;, 34.1, Ut, Zt, с, M): 
= sin J4—1,0 + t sin(pi 1,0 + wi At) 
2: Dt = ш-1 + тя COS 11,0 — xs cos( 1,0 + QUAL) 
wAt 
1 0 ut COS [441,0 — ue cos(p—1,0 + АФ) 
3 С,= [| 0 1 e sin [441,0 — s sin(Jua 1,0 + w,At) 
0 0 1 
4 Ut = С, 4G; +В 
or 0 0 
5: 0, = 0 o 0 
0 0 o; | "un 
6 for all observed features zi = (ri фі si)? do 
T: je 
8: ee еа 
by Ty Bt 
9: а = 676 
| vd 
10: 2 = | atan2(d,, 62) — lio 
TI s 
| Vqdó; —4qó, 0 
11: Hp by ôs  -1 
q 
0 0 0 
6 d 0 N T 4 4T —1 
12: Ky =>, Ay (H; X Не + О) 
13: endfor 
14: pi = fie + У, КЕ - 2) 
15: y= Uh, кіні) Ў, 
16: return Ht, Ut 


Table 7.2 The extended Kalman fi Iter (ЕКЕ) localization algorithm, formulated here for 
a feature-based map and a robot equipped with sensors for measuring range and bearing. 
This version assumes knowledge of the exact correspondences. 
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Figure 7.6 Example of localization using the extended Kalman filter. The robot moves 
on a straight line. As it progresses, its uncertainty increases gradually, as illustrated by the 
error ellipses. When it observes a landmark with known position, the uncertainty is reduced. 


corresponding uncertainty ellipse. Lines 5 to 15 implement the measurement update. 
The core of this update is a loop through all possible features 7 observed at time t. 
In Line 7, the algorithm assigns to j the correspondence of the i-th feature in the 
measurement vector. It then calculates a predicted measurement 2? and the Jacobian 
Н} of the measurement model. The Kalman gain KŻ is then calculated in Line 12 for 
each observed feature. The sum of all updates is then applied to obtain the new pose 
estimate, as stated in Lines 14 and 15. Notice that the last row of H7 is all zero. This is 
because the signature does not depend on the robot pose. The effect of this degeneracy 
is that the observed signature s} has no effect on the result of ће EKF update. This 
should come at no surprise: knowledge of the correct correspondence 2? renders the 
Observed signature entirely uninformative. 


Figure 7.6 illustrates the EKF localization algorithm in a synthetic environment with 


a single landmark. The robot starts out on the left and accrues uncertainty as it moves. 
Upon seeing the landmark, its uncertainty is gradually reduced as indicated. 


7.5.3 Mathematical Derivation 


To understand the motion update, let us briefly restate the motion model that was 
defined in Equation (5.13): 


sf Ж a sin 0 + md sin(0 + w, At) 
y = y + M cos 0 — "m cos(6 + v At) (7.4) 
6’ 0 Ut AC + А 
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Неге т, = (x y Ө)” and 2; = (a’ y' 0')7 are the state vectors at time t — 1 and t, 
respectively. As before, the control is a vector of two independent velocities: 


ш = e. Ji (1.5) 


where v; is a translational velocity, and о; is a rotational velocity, and At is the time 
window over which the robot motion 1s executed. The variable DR in (7.4) stands for 
the quotient =. 


We already know from Chapter 3 that EKF localization maintains its local posterior 
estimate of the state, represented by the mean и,—_1 and covariance У, 1. We also re- 
call that the “trick” of the ЕКЕ lies in linearizing the motion and measurement model. 
For that, we decompose the motion model into a noise-free part and a random noise 
component with (approximately) zero mean. 


a! x — 2. sin + 2 sin(0 + w,At) 

у = y |+| gtcosd— 2+ cos(6 Fw, At) | +N(0, R(J.6) 
6' 0 wAt 

2+ 9(иь,2+-1) 


This is of the form (3.50) defined in Chapter 3. We recall from that chapter that ЕКЕ 
linearization approximates g through a Taylor expansion: 


gu Ta) © gua) + Gi opa — т) (7.7) 


The function g(u;, е1) is simply obtained by replacing the exact state z,..; —-which 
we do not know—by our expectation j1;_1—which we know. The Jacobian С; 15 the 
derivative of the function g with respect to рш; _ 1, and evaluated at и, and џи: 


z COS J41 1,0 — m cos(pa—1,0 + А1) 

Vt gj — Ut 

22 Sin ру 1,0 — 0 sin(pi 10 + wt At) 
1 


С, = g (us, pai) zc 


оо н 
O ҥе © 


(7.8) 


Here и+—1,ө denotes the third component of the mean vector и+—1, which is the esti- 
mate of the rotation 0;. 


172 CHAPTER 7 


The reader may have noticed that we silently ignored the fact that the amount of noise 
in motion X, depends on the magnitude of robot motion u;. When implementing ЕКЕ 
localization, the covariance R+ is a function of the velocities v; and ш, and the mean 
Ht—1. This issue was already discussed in more depth in Chapter 5, and for the sake 
of brevity we leave the design of an appropriate >, to the reader as an exercise. 


EKF localization also requires a linearized measurement model with additive Gaussian 
noise, as discussed back in Chapter 3. The measurement model for our feature-based 
maps shall be a variant of Equation (6.41) in Chapter 6.6 which presupposes knowl- 
edge of the landmark identity via the correspondence variables. x Let us denote by 7 
the identity of the landmark that corresponds to the ?-th component in the measurement 
vector. Then we have 


a= t 
Si 
М (туь — 2)? + (mj, — y? N(0, or) 
= atan2(mj,y — Y,Mj,2 0) -0 | + | M0, о) (7.9) 
Mj s (о, Os) 
Here (mj, Mj y)” are the coordinates of the landmark observed at time t, and тл; 


is its (correct) signature. The variables r? and ф* denote the range and bearing to this 
landmark, and s? denotes the signature as perceived by the robot. the variances of the 
measurement noise is given by c;., сд, and os, respectively. To bring this into a form 
familiar from Chapter'3, we rewrite this model as follows: 


ze = o h(znjm) + N(0,Qi) (7.10) 


with z; = (x y 0)7 and 


V (m.s — 2)? + (т, — 0)? 


h(r,,j,m) = atan2(m;, — y, mj, — 2) — 0 (7.11) 
Mj s 
and 
o, 0 0 
Q = 0 o, 0 (7.12) 
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The Taylor approximation follows now directly from Equation (3.52): 
h(z, jm) & Аўт) + Hi (xi — fie) (7.13) 


Here Н, is the Jacobian of h at р;, calculated for ће i-the landmark and the map m. 
This approximation substitutes the exact robot pose x; in А(2,, j, m) by the estimate 
Da = (Bas Шу Heo)”. The Jacobian of h is given by the following matrix 


ort Ori Ori 
Oftt.2 — Ohty — Ole 


dy Mo 4 = дф: дф: дф: 
Н; = h (да, j, m) Oi. . EM Diino (7.14) 
Os; дв! 5i 


дв Өш, ӨШ ө 


in which fiz = (Дх Lt y Дө)" denotes the estimate jz; factored into its individual 
three values. Calculating the desired derivatives from Equation (7.11) gives us the 
following matrix: 


TI y m lt, x Ut — me» 0 


ЕСА 
ні = Шу — Yt jx Htm 1 (7.15) 
dt qt 


0 0 0 


with де = (Mj = ja)? + (mj, = еу), and j = cå if the landmark that corresponds 
to the measurement 22. This linear approximation, plugged into the standard ЕКЕ 
equations, leads to lines 7 through 11 in the EKF localization algorithm shown in 


Table 7.2. 


Finally, we note that our feature-based localizer processes multiple measurements at- 
a-time, whereas the ЕКЕ discussed in Chapter 3.2 only processed a single sensor item. 
Our algorithm relies on an implicit conditional independence assumption, which we 
briefly discussed in Chapter 6.6, Equation (6.40). Essentially, we assume that all fea- 
ture measurement probabilities are independent given the pose x, and the map т: 


plal zem) = [[»Giloom (7.16) 
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This is usually a good assumption, especially if the world is static. It enables us to 
add the information from multiple features into our filter, as specified in lines 14 and 
15 in Table 7.2. This slightly non-obvious addition in Table 7.2 stems from the fact 
that multiple features are integrated via Bayes rule (which is a product). The addition 
takes place in information space, in which probabilities are represented logarithmically 
(c.f., Chapter 3.4). The logarithm of a product is a sum, hence the additive form of 
this integration step. 


7.6 ESTIMATING CORRESPONDENCES 


7.6.1 EKF Localization with Unknown 
Correspondences 


The EKF localization discussed thus is only applicable when landmark correspon- 
dences can be determined with absolute certainty. In practice, this is rarely the case. 
Most implementations therefore determine the identity of the landmark during local- 
ization. Throughout this book, we will encounter a number of strategies to cope with 
the correspondence problem. The most simple of all is known as maximum likelihood 
correspondence, in which one first determines the most likely value of the correspon- 
dence variable, and then takes this value for granted. 


Maximum likelihood techniques are brittle if there are many equally likely hypotheses 
for the correspondence variable. However, one can often design the system for this to 
be not the case. To reduce the danger of asserting a false data association, there ex- 
ist essentially two techniques: First, select landmarks that are sufficiently unique and 
sufficiently far apart from each other that confusing them with each other is unlikely. 
Second, make sure that the robot’s pose uncertainty remains small. Unfortunately, 
these two strategies are somewhat counter to each other, and finding the right granu- 
larity of landmarks in the environment can be somewhat of an art. 


Nevertheless, the maximum likelihood technique is of great practical importance. Ta- 
ble 7.3 depicts the EKF localization algorithm with a maximum likelihood estimator 
for the correspondence. The motion update in Lines 2 through 4 is identical to the one 
in Table 7.2. The key difference is in the measurement update: Here we first calculate 
for all landmarks k in the map a number of quantities that enables us to determine 
the most likely correspondence (Lines 6 through 12). The correspondence variable 
is then chosen in Line 14, by minimizing a quadratic Mahalanobis distance function 
defined over the measured feature vector 2? and the expected measurement 27, for any 
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1: Algorithm ЕКЕ localization(u:—1, 34 1, Ut, zt, m): 
= sin 41,0 + Es sin(ji 1,0 + wA) 
2: fe = pui = COS [141,8 — ae cos(pa—1,0 + wi A) 
wA 
1 0 St cosu-1,0 — Zt CoS(pi—1,9 + wA) 
3: С,= | 0 1 e sin [441,0 — m sin(4u 1,9 + w,At) 
0 0 
4: У, = Gi E GI + Ri 
o, 0 0 
21 Qi = 0 og 0 
0 0 o; 
6: for all landmarks k in the map m do 
7: Ok = 7 = Mk a Е Mew ) 
k,y Mk y — Ht,y 
8: Як = б Ôk 
ук 
9: a = atan? (ðk y, бк) — hto 
Mks 
i i VG, Ok — V4, Sky 0 
10: Н; = dk ру бк —1 
0 0 0 
11: V, = HES, [HE]? + Qi 
12: endfor 
13: for all observed features zi = (ri фі st)” do 
14: jli) = argmin(z} — 2)? Фу! (24 — 28) 
k 
15: KER 
16: endfor 
18: X = (1 У, Ki Hi) Ӯ, 
19: return Ht, X 


Table 7.3 The extended Kalman fi Iter (EKF) localization algorithm with unknown corre- 
spondences. The correspondences j (i) are estimated via a maximum likelihood estimator. 
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possible landmark тт in the map. The covariance in this expression is composed of 
the measurement uncertainty, calculated in Line 5, and the robot uncertainty projected 
into the measurement space (Line 11). The final EKF update in Lines 17 and 18 only 
incorporates the most likely correspondences. 


The algorithm in Table 7.2 is inefficient. It can be improved through a more thoughtful 
selection of landmarks in Lines 6 through 12. In most settings, the robot only sees a 
small number of landmarks at a time in its immediate vicinity; and simple tests can 
reject a large number of unlikely landmarks in the map. 


Further, the algorithm can be modified to accommodate outliers. The standard ap- 
proach is to only accept landmarks for which the Mahalanobis distance in Line 14, 
or the associated probability, passes a threshold test. This is generally a good idea: 
Gaussians fall of exponentially, and a single outlier can have a huge effect on the 
pose estimate. In practice, thresholding adds an important layer of robustness to the 
algorithm without which EKF localization tends to be brittle. 


7.6.2 Mathematical Derivation 


The maximum likelihood estimator determines the correspondence that maximizes the 
data likelihood. 


с = argmax p(zi | Cit; M, 21:81, U1:t) (7.17) 
Ct 


Here c; is the correspondence vector at time t. The vector 2; = {21, 22,...} is the 
measurement vector which contains the list of features 2? observed at time t. As 
before, each feature vector now contains three elements, the range, the bearing, and 
the signature: 


AZ = (т Фф 2 (7.18) 


The argmax in (7.17) selects the correspondence vector ё; that maximized the like- 
lihood of the measurement. Note that this expression is conditioned on prior corre- 
spondences ст.1. While those have been estimated in previous update steps, the 
maximum likelihood approach treats those as if they are always correct. This has two 
important ramifications: It makes it possible to update the filter incrementally. But it 
also introduces brittleness in the filter, which tends to diverge when correspondence 
estimates is erroneous. 
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The likelihood p(z | сул, M, Z1:t—1, Чы) in Equation (7.17) is now easily computed 
from the belief bel(z,) by integrating over the pose x+, and omitting irrelevant condi- 
tioning variables: 


p(zi | сүд, M, £33 1. Чы) 


[oe | Cit, Lt, M, 214-1, 1) plz | Cit; M, Z1:t—1; Чы) dai 


= е | cc, £t, M) poi | eiiam, Z1:t—1, 014) dz 


| 


[oe | ct, £t, M) bel (x+) da (7.19) 


This yields the following maximization 
Сүт = argmax | ple | ct, £6, M) bel(x+) da, (7.20) 


This maximization is carried out over the entire correspondence vector c+. 


Unfortunately, there are exponentially many terms in this maximization. When the 
number of features per measurement is large, the number of possible feature vectors 
may grow too large for practical implementations. The most common technique to 
avoid such an exponential complexity performs the maximization separately for each 
individual feature 2? in the measurement vector 2; = (21, 22,...}. 


ё& = argmax i plzi | сї, ze, m) bel (xz) dz; (7.21) 


The implications of this component-wise optimization will be discussed below; we 
note that it is "justified" only when we happen to know that individual feature vectors 
are conditionally independent—an assumption that is usually adopted for convenience, 
and that we will discuss further below. 


р(& |сж,т) = [|0 |, =, т) (7.22) 


This assumption is analogous to the one stated in Equation (6.40). Under this as- 
sumption, the term that is being maximized in (7.20) becomes a product of terms with 
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disjoint optimization parameters, for which the maximum is attained when each indi- 
vidual factor is maximal. 


The measurement probability for each 2? is given by the following exponential func- 
tion: 


p(zi | ci, жү, m) (7.23) 
= mgexpí(-i(z — h(x, dm) Ог! (2 — һ(ж,,с,т))} 


where A is defined as in (7.11). Applying once again our Taylor expansion as in (3.52) 
gives us 


plil ram) = техр1—21(21— Һр, с, т) – Н, (ж — n) Qi 
(zi — h(fiz, cj, m) — Н, (zi — fiz))} (7.24) 


Thus, the integral (7.21) can be written as 


Ji n exp{—Li(z}, Ci, ж, m)) day (7.25) 
with 
MENS) == (ш E h(i, ci, m) E Hi (zi pe ja)? Qi 
(zi — h(n cj, m) — Н, (a+ — fit) 
d 5 (ж — pa)? it (ж, — fit) (7.26) 


We already encountered integrals of this form in Chapter 3.2, where we derived the 
motion update of the Kalman filter and the EKF. The closed-form solution to this in- 
tegral is derived completely analogously to those derivations. In particular, the Gaus- 
sian defined by (7.25) has mean h(jiz, сї, m) and covariance Н, У, HT + Qi. Thus, 
we have under our linear approximation the following closed form expression for the 
measurement likelihood: 


[oe | ci, x4, m) bel(z;) dx, ~ N (h(n, ci, m), Hi 5, HE +Q) (7.27) 
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and thus 


р(& | Cit, M, 14-1, U1) (7.28) 
= qm exp (3 (z = Мда, c, m))" ІН, У, н + Q^ (21 ES h(i, cj, m))] 


Since we only seek the value for сї that maximizes this function, it suffices to maxi- 
mize the quadratic term in the exponential: 


& = argmax (2; — Һи, ej, m))” [Hi Ў, HE + Qi (2 — his ei, m) 


i 
er 


(7.29) 


This calculation is implemented in Line 15 in Table 7.3. 


This distribution is remarkably similar to the probability calculated by the algorithm 
landmark model known. correspondence in Table 6.4, the only difference arising 
from the covariance term. In Table 6.4, the robot pose is assumed to be known, hence 
the covariance reduces to the measurement covariance Q+. Here we only have a prob- 
abilistic estimate of the pose, hence have to use our best estimate for the pose, and fold 
in the uncertainty in our estimate. Our state estimate is given by ju. As the deriva- 
tion above shows, the covariance is adjusted by the additional uncertainty Н, X; H7, 
which is the projection of the pose uncertainty under the linearized approximation of 
the measurement function h. This shows the correctness of the calculation in lines 12 
and 13 in our EKF algorithm in Table 7.3: тк is indeed the desired likelihood. The cor- 
rectness of the algorithm follows from our assumption that feature measurements are 
conditionally independent, as stated in Equation (7.22). Of course, this independence 
is usually violated, which makes our algorithm approximate. A further approximation 
is introduced by the Taylor expansion. 


7.7 MULTI-HYPOTHESIS TRACKING 


There exist a number of extensions of the basic EKF to accommodate situations where 
the correct data association cannot be determined with sufficient reliability. Several of 
those techniques will be discussed later in this book, hence our exposition at this point 
will be brief. 


A classical technique that overcomes difficulties in data association is the Multi- 
hypothesis Tracking Algorithm (MHT). The MHT can represent a belief by multiple 
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Gaussians, that is, the posterior is represented by the mixture 


1 at 23 
bel(x+) p» pidet (2лУ; т) 2 exp f- iG — pia) Dep (we — i)) 


y» Wel 1 
(7.30) 


Неге l is the index of the mixture component. Each such component, or "track" in 
МНТ slang, is itself a Gaussian with mean p; and covariance >; ү. The scalar v; ; > 
0 is a mixture weight. It determines the weight of the /-th mixture component in the 
posterior. Since the posterior is normalized by У ı 11, each v; is a relative weight, 
and the contribution of the /-th mixture component depends on the magnitude of all 
other mixture weights. 


As we shall see below when we describe the MHT algorithm, each mixture component 
relies on a unique sequence of data association decisions. Hence, it makes sense to 
write c, ; for the data association vector associated with the /-th track, and сі. for all 
past and present data associations associated with the /-th mixture component. With 
this notation, we can now think of mixture components as contributing local belief 
functions conditioned on a unique sequence of data associations: 


bel(x4) =  p(zi | 214, mis cun) (7.31) 


Неге сїз = ieu: COT is. T denotes the sequence of correspondence vectors as- 
sociated with the /-th track. 


Before describing the MHT, it makes sense to discuss a completely intractable algo- 
rithm from which the MHT is derived. This algorithm is the full Bayesian implemen- 
tation of the EKF under unknown data association. It is amazingly simple: Instead 
of selecting the most likely data association vector, our fictitious algorithm maintains 
them all. More specifically, at time ¢ each mixture is split into many new mixtures, 
each conditioned on a unique correspondence vector c;. Let m be the index of one of 
the new Gaussians, and / be the index from which this new Gaussian is derived, for 
the correspondence c; ;. The weight of this new mixture is then set to 


Wem = ы plz | C14 1,5 Ctm; Ž1:t—1; U1:t) (7.32) 


This is the product of the mixture weight v; ; from which the new component was 
derived, times the likelihood of the measurement z; under the specific correspondence 
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vector that led to the new mixture component. In other words, we treat correspon- 
dences as latent variable and calculate the posterior likelihood that a mixture compo- 
nent is correct. A nice aspect of this approach is that we already know how to compute 
the measurement likelihood p(z | C1:t—1,1, Ctm; 21:21, U1:+) in Equation (7.32): It is 
simply the product [| ть of the individual feature likelihoods computed in line 13 of 
our localization algorithm in Table 7.3. Thus, we can incrementally calculate the mix- 
ture weights for each new component. The only downside of this algorithm is the fact 
that the number of mixture components, or tracks, grow exponentially over time. 


The MHT algorithm approximates this algorithm by keeping the number of mixture 
components small. In essence, it terminates every component whose relative mixture 
weight 


VI 


7.33 
2525 Ut т ( ) 


is smaller than a threshold Win. It is easy to see that the number of mixture com- 
ponents is always at most Val. Thus, the MHT maintains a compact posterior that 
can be updated efficiently. It is approximate in that it maintains a very small number 
of Gaussians, but in practice the number of plausible robot locations is usually very 
small. 


We will omit a formal description of the MHT algorithm at this point, and instead 
refer the reader to a large number of related algorithms in this book. However, when 
implementing the MHT, it is useful to devise strategies for identifying low-likelihood 
tracks before instantiating them. 


7.8 PRACTICAL CONSIDERATIONS 


The EKF localization algorithm and its close relative, MHT localization, are popular 
techniques for position tracking. There exist a large number of variations of these 
algorithm that enhance their efficiency and robustness. 


m Efficient search. First, it is often impractical to loop through all landmarks k in 
the map. Often, there exist simple tests to identify plausible candidate landmarks 
(e.g., by simply projecting the measurement into z-y-space), enabling one to rule 
out all but a constant number of candidates. Such algorithms can be orders of 
magnitude faster that our naive implementations. 
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= Mutual exclusion. A key limitation of our implementations arises from our as- 
sumed independence of feature noise in the EKF (and, by inheritance, the MHT). 
The reader may recall condition (7.22), which enabled us to process individual 
features sequentially, thereby avoiding a potential exponential search through the 
space of all correspondence vectors. Unfortunately, such an approach allows for 
assigning multiple observed features, say z} and 27 with i Æ j, to be assigned to 
the same landmark in the map: ĉ = ё*. For many sensors, such a correspondence 
assignment is wrong by default. For example, if the feature vector is extracted 
from a single camera image, we know by default that two different regions in the 
image space must correspond to different locations in the physical world. Put dif- 
ferently, we usually know that i Z j — ci 4 ĉ. This (hard!) constraint is called 
mutual exclusion principle in data association. It reduces the space of all possi- 
ble correspondence vectors. Advanced implementations consider this constraint. 
For example, one might first search for each correspondence separately—as in 
our version of the ЕКЕ localizer—followed by a “repair” phase in which viola- 
tions of the mutual exclusion principle are resolved by changing correspondence 
values accordingly. 


= Outliers. Further, our implementation does not address the issue of outliers. 
The reader may recall from Chapter 6.6 that we allow for a correspondence c — 
N + 1, with N being the number of landmarks in the map. Such an outlier 
test is quite easily added to our algorithms. In particular, if we set туф to be 
the a prior probability of an outlier, the argmax-step in line 15 EKF localization 
(Table 7.3) will default to N + 1 if an outlier is the most likely explanation of the 
measurement vector. Clearly, an outlier does not provide any information on the 
robot's pose; hence, the corresponding terms are simply omitted in lines 18 and 
19 in Table 7.3. 


Both the EKF and the MHT localization are only applicable to position tracking prob- 
lems. In particular, linearized Gaussian techniques tend to work well only if the posi- 
tion uncertainty is small. There are three complimentary reasons for this observation: 


m  Auni-modal Gaussian is usually a good representation of uncertainty in tracking 
whereas it is not in more general global localization problems. Both the EKF and 
the MHT start with a single unimodal Gaussian, although the MHT can poten- 
tially branch into multiple local Gaussian. 


m  Anarrow Gaussian reduces the danger of erroneous correspondence decisions. 
This is important particularly for the EKF, since a single false correspondence 
can derail the tracker by inducing an entire stream localization and correspon- 
dence errors. The MHT is more robust to this problem, though it can fail equally 
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when the correct correspondence is not among those maintained in the Gaussian 
mixture. 


m Тһе Taylor expansion is usually only good in a close proximity to the lineariza- 
tion point. As a rule of thumb, if the standard deviation for the orientation 0 is 
larger then +20 degrees, linearization effects are likely to make the algorithm 
fail. This problem applies equally to the EKF and the MHT and explains why 
starting the MHT with a very wide initial Gaussian does not turn it into a global 
localization algorithm. 


For all those reasons, the Gaussian localization algorithms discussed in this chapter 
are inapplicable to global localization problems or the kidnapped robot problem. 


The design of the appropriate features for EKF localization is a bit of an art. This is 
because multiple competing objectives have to be met. On the one hand, one wants 
sufficiently many features in the environment, so that the uncertainty in the robot's 
pose estimate can be kept small. Small uncertainty is absolutely vital for reasons 
already discussed. On the other hand, one wants to minimize chances that landmarks 
are confused with each other, or that the landmark detector detects spurious features. 
Many environments do not possess too many point landmarks that can be detected 
with high reliability, hence many implementation rely on relatively sparsely distributed 
landmarks. Here the MHT has a clear advantage, in that it is more robust to data 
association errors. As a rule of thumb, large numbers of landmarks tend to work 
better than small numbers even for the EKF. When landmarks are dense, however, it 
is critical to apply the mutual exclusion principle in data association. 


Finally, we note that EKF localization processes only a subset of all information in 
the sensor measurement. By going from raw measurements to features, the amount 
of information that is being processed is already drastically reduced. Further, ЕКЕ 
localization is unable to process negative information. Negative information pertains 
to the absence of a feature. Clearly, not seeing a feature when one expects to see it 
carries relevant information. For example, not seeing the Eiffel Tour in Paris implies 
that it is unlikely that we are right next to it. The problem with negative information 
is that it induces non-Gaussian beliefs, which cannot be represented by the mean and 
variance. For this reason, EKF implementations simply ignore the issue of negative 
information, and instead integrate only information from observed features. The stan- 
dard MHT also avoids negative information. However, it is possible to fold negative 
information into the mixture weight, by decaying mixture components that failed to 
observe a landmark. 


With all these limitations, does this mean that Gaussian techniques are generally brittle 
and inapplicable to more general localization techniques? The answer is no. In fact, 
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the key to successful localization lies in the approach for data association. Later in this 
book, we will encounter more sophisticated techniques for handling correspondences 
than the ones discussed thus far. Many of these techniques are applicable (and will be 
applied!) to Gaussian representations, and the resulting algorithms are often among 
the best ones known. 


7.9 SUMMARY 


In this chapter, we introduced the mobile robot localization problem and devised a first 
practical algorithm for solving it. 


m Тһе localization problem is the problem of estimating a robot's pose relative to a 
known map of its environment. 


m Position tracking addresses the problem of accommodating the local uncertainty 
of a robot whose initial pose is known; global localization is the more general 
problem of localizing a robot from scratch. Kidnapping is a localization problem 
in which a well-localized robot is secretly teleported somewhere else without 
being told—it is the hardest of the three localization problems. 


m The hardness of the localization problem is also a function of the degree to which 
the environment changes over time. АП algorithms discussed thus far assume a 
static environment. 


m Passive localization approaches are filters: they process data acquired by the 
robot but do not control the robot. Active techniques control the robot. In this 
and the next chapter, we study passive approaches; active approaches will be 
discussed in Chapter ?? of this book. 


= Markov localization is just a different name for the Bayes filter applied to the 
mobile robot localization problem. 


= EKF localization applies the extended Kalman filter to the localization problem. 
EKF localization is primarily applied to feature-based maps. 


= The most common technique for dealing with correspondence problems is the 
maximum likelihood technique. This approach simply assumes that at each point 
in time, the most likely correspondence is correct. 


m The multi hypothesis tracking algorithm (MHT) pursues multiple correspon- 
dences, using a Gaussian mixture to represent the posterior. mixture components 
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are created dynamically, and terminated if their total likelihood sinks below a 
user-specified threshold. 


= Тһе MHT is more robust to data association problems than the ЕКЕ, at an in- 
creased computational cost. 


= Both EKF and MHT localization are well-suited for local position tracking prob- 
lems with limited uncertainty and in environments with distinct features. They 
are less applicable to global localization or in environments where most objects 
look alike. 


m Selecting features for EKFs and MHTs requires skill! The performance of both 
algorithms can be improved by a number of measures, such as enforcing mutual 
exclusion in data association. 


In the next chapter, we will discuss probabilistic localization techniques that can cope 
with more general localization problems, such as the global localization problem or 
the problem of localizing in dynamic environments. 
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8 


GRID AND MONTE CARLO 
LOCALIZATION 


8.1 INTRODUCTION 


This chapter describes two localization algorithms that are capable of solving global 
localization problems. These algorithms possess a number of differences to the Gaus- 
sian techniques discussed in the previous chapter. 


m They can process raw sensor measurements. There is no need to extract fea- 
tures from sensor values. As a direct implication, they can also process negative 
information. 


m They are non-parametric. In particular, they are not bound to a uni-modal distri- 
bution as was the case with the EKF localizer. 


m They can solve global localization and—in some instances—kidnapped robot 
problems. Neither the EKF nor the MHT are able to solve such problems— 
although the MHT can be modified so as to solve global localization problems. 


The techniques presented here have exhibited excellent performance in a number of 
fielded robotic systems. 


The first approach is called grid localization. It uses a histogram filter to represent the 
posterior belief. A number of issues arise when implementing grid localization: with 
a fine-grained grid, the computation required for a naive implementation may make 
the algorithm unreasonably slow. With a coarse grid, the additional information loss 
through the discretization negatively affects the filter and—if not properly treated— 
may even prevent the filter from working. 
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1: Algorithm Grid localization({p;, ; 1), we, zt, т): 

2: for all k do 

3: Prt = py m motion. model(mean(x;), uz, mean(x;)) 
4: Dk, = 1 measurement model(z;, mean(xy), m) 

5: endfor 

6: return {рк} 


Table 8.1 Grid localization, a variant of the discrete Bayes filter. The function mo- 
tion. model implements one of the motion models, and measurement.model a sensor 
model. The function “mean” returns the center of gravity of a grid cell x р. 


The second approach is the Monte Carlo localization (MCL) algorithm, arguably the 
most popular approach to date. It uses particle filters to estimate posteriors over robot 
poses. A number of shortcomings of the MCL are discussed, and techniques for ap- 
plying it to the kidnapped robot problem and to dynamic environments are presented. 


The material in this chapter covers some of the most successful methods to date. Ap- 
propriately implemented, these techniques are able to localize robots globally, and to 
recover from localization failure. These abilities make the algorithms presented here 
the method of choice in many applications that require reliable robot operation. 


8.2 GRID LOCALIZATION 


8.2.1 Basic Algorithm 


Grid localization approximates the posterior using a histogram filter over a grid de- 
composition of the pose space. The discrete Bayes filter was already extensively dis- 
cussed in Chapter 4.1 and is depicted in Table 4.1. It maintains as posterior a collection 
of discrete probability values 


bel(z,) = {Pkt} (8.1) 
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where each probability p; ғ is defined over a grid cell хр. The set of all grid cells 
forms a partition of the space of all legitimate poses: 


range(X;,) = xi:UXa4U ...XkK, (8.2) 


In the most basic version of grid localization, the partitioning of the space of all poses 
is time-invariant, and each grid cell is of the same size. A common granularity used in 
many of the indoor environments is 15 centimeters for the x- and y-dimensions, and 
5 degrees for the rotational dimension. A finer representation yields better results, but 
at the expense of increased computational requirements. 


Grid localization is largely identical to the basic binary Bayes filter from which it 
is derived. Table 8.1 provides pseudo-code for the most basic implementation. It 
requires as input the discrete probability values {p:_1,,}, along with the most recent 
measurement, control, and the map. Its inner loop iterates through all grid cells. Line 
3 implements the motion model update, and line 4 the measurement update. The 
final probabilities are normalized, as indicated by the normalizer ņ in line 4. The 
functions motion model, and measurement model, may be implemented by any of 
the motion models in Chapter 5, and measurement models in Chapter 6, respectively. 
The algorithm in Table 8.1 assumes that each cell possesses the same volume. 


Figure 8.1 illustrates grid localization in our one-dimensional hallway example. This 
diagram is equivalent to that of the general Bayes filter, except for the discrete nature of 
the representation. As before, the robot starts out with global uncertainty, represented 
by a uniform histogram. As it senses, the corresponding grid cells raise its probability 
value. The example highlights the ability to represent multi-modal distributions with 
grid localization. 


8.2.2 Grid Resolutions 


A key variable of the grid localizer is the resolution of the grid. On the surface, this 
might appear to be a minor detail; however, the type sensor model that is applicable, 
the computation involved in updating the belief, and the type results to expect all 
depend on the grid resolution. 


At the extreme end are two types of representations, both of which have been brought 
to bear successfully in fielded robotics systems. 
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Figure 8.1 Grid localization using a fine-grained metric decomposition. Each picture 
depicts the position of the robot in the hallway along with its belief bel(z:), represented by 


a histogram over a grid. 
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Grid Environment 


Figure 8.2 Example of a fi xed-resolution grid over the robot pose variables x, у, and 0. 
Each grid cell represents a robot pose in the environment. Different orientations of the robot 
correspond to different planes in the grid (shown are only three orientations). 


= Coarse, variable-resolution grids. Some implementation decompose the space 
of all poses into regions that correspond to "significant" places in the environ- 
ment. Such places may be defined by the presence (or absence) of specific land- 
marks, such as doors and windows. In hallway environments, places may corre- 
spond to intersections, T-junctions, dead ends, and so on. In such representations, 
the resolution of the decomposition depends on the structure of the environment, 
and they tend to be course. Figure 8.5 shows such a coarse representation for 
the one-dimensional hallway example. Course representation like these are com- 
monly associated with topological representations of space. 


m Fine fixed-resolution grids. Other methods decompose the state space using 
equally spaced grids. The resolution of such decompositions is usually much 
higher than that of variable-resolution grids. For example, some of the examples 
in Chapter 7 use grid decompositions with cell sizes of 15 centimeters or less. 
Hence, they are more accurate, but at the expense of increased computational 
costs. Figure 8.2 illustrates such a fixed-resolution grid. Fine resolution like 
these are commonly associated with metric representation of space. 


When implementing grid localization for coarse resolutions, it is important to compen- 
sate for the coarseness in the resolution in the sensor and motion models. In particular, 
for a high-resolution sensor like a laser range finder, the value of the measurement 
model p(z; | £+) may vary drastically inside each grid cell x; ;. If this is the case, just 
evaluating it at the center-of-gravity will generally yield a poor result. Similarly, pre- 
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Figure8.3 Average localization error as a function of grid cell size, for ultrasound sensors 
and laser range-fi nders. 


dicting robot motion from the center-or-gravity may yield poor results: If the motion 
is updated in 1-second intervals for a robot moving at 10cm/sec, and the grid resolu- 
tion is 1 meter, the naive implementation will never result in a state transition. This is 
because any location that is approximately 10cm away form the center-or-gravity of a 
grid cell still falls into the same grid cell. 


A common way to compensate this effect is to modify both the measurement and the 
motion model by inflating the amount of noise. For example, the variance of a range 
finder model's main Gaussian cone may be enlarged by half the diameter of the grid 
cell. In doing so, the new model is much smoother, and its interpretation will be 
less susceptible to the exact location of the sample point relative to the correct robot 
location. However, this modified measurement model reduces the information intake, 
thereby reducing the localization accuracy. Similarly, a motion model may predict a 
random transition to a nearby cell with a probability that it proportional to the length of 
the motion arc, divided by the diameter of a cell. The result of such an inflated motion 
model is that the robot can indeed move from one cell to another, even if its motion 
between consecutive updates is small relative to the size of a grid cell. However, the 
resulting posteriors are wrong in that an unreasonably large probability will be placed 
on the hypothesis that the robot changes cell at each motion update—and hence moves 
much much faster than commanded. 
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Figure 8.4 Average CPU-time needed for global localization as a function of grid resolu- 
tion, shown for both ultrasound sensors and laser range-fi nders. 


Figures 8.3 and 8.4 plot the performance of grid localization as a function of the res- 
olution, for two different types of range sensors. As to be expected, the localization 
error increases as the resolution decreases. The total time necessary to localize a robot 
decreases as the grid becomes coarser, as shown in Figure 8.4. 


8.2.3 Computational Considerations 


When using a fine-grained grid such as some of the metric grids described in the 
previous section, the basic algorithm cannot be executed in real-time. At fault are both 
the motion and the measurement update. The motion update requires a convolution, 
which for a 3-D grid is a 6-D operation. The measurement update is a 3-D operation, 
but calculating the likelihood of a full scan is a costly operation. 


There exist a number of techniques to reduce the computational complexity of grid 
localization. 


m  Pre-caching. The measurement model is costly since it may involve ray cast- 
ing. As motivated in Chapter 6.3.4, a common strategy is to calculate for each 
grid cell essential statistics that facilitate the measurement update. In particular, 
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Figure 8.5 Application of a coarse-grained, topological representation to mobile robot 
localization. Each state corresponds to a distinctive place in the environment (a door in this 
case). The robot’s belief bel (x1) of being in a state is represented by the size of the circles. 
(a) The initial belief is uniform over all poses. (b)b shows the belief after the robot made 
one state transition and detected a door. At this point, it is unlikely that the robot is still in 
the left position. 


when using a beam model, it is common to cache away the correct range for 
each center-of-gravity point in each grid cell. Further, the sensor model can be 
precalculated for a fine-grained array of possible ranges. The calculation of the 
measurement model reduces then to two table lookups, which is much faster. 


Sensor subsampling. Further speed-ups can be achieved by evaluating the mea- 
surement model only for a subset of all ranges. In some of our systems, we use 
only 8 of our 360 laser range measurement and still achieve excellent results. 
Subsampling can take place spatially and in time. 


Delayed motion updates. Instead of applying the motion update every time a 
control is issued or an odometry reading is obtained, it is possible to reduce the 
frequency of motion update. This is achieved by geometrically integrating the 
controls or odometry readings over short time period. This can speed up the 
algorithm by an order of magnitude. 


Selective updating. This technique was already described in Chapter 4.1.3. 
When updating the grid, selective techniques update a fraction of all grid cells 
only. A common implementation of this idea updates only those grid cells whose 
posterior probability exceeds a user-specified threshold. Selective updating tech- 
niques can reduce the computational effort involved in updating beliefs by many 
orders of magnitude. Special care has to be taken to reactivate low-likelihood 
grid cells when one seeks to apply this approach to the kidnapped robot problem. 
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With these modifications, grid localization can in fact become quite efficient; Even 10 
years ago, low-end PC were fast enough to generate the results shown in this chapter. 
However, our modifications place an additional burden on the programmer, and make 
a final implementation more complex than the short algorithm in Table 8.1 suggests. 


8.2.4 Illustration 


Figure 8.6 shows an example of Markov localization with metric grids, at a spatial 
resolution of 15 centimeter and an angular resolution of 5 degrees. Shown there is 
a global localization run where a mobile robot equipped with two laser range-finders 
localizes itself from scratch. The probabilistic model of the range-finders is computed 
by the beam model described in Section 6.3 and depicted in Table 8.1. 


Initially, the robot's belief is uniformly distributed over the pose space. Figure 8.6a 
depicts a scan of the laser range-finders taken at the start position of the robot. Here, 
max range measurements are omitted and the relevant part of the map is shaded in 
grey. After incorporating this sensor scan, the robot's location is focused on just a few 
regions in the (highly asymmetric) space, as shown by the gray-scale in Figure 8.6b. 
Notice that beliefs are projected into x-y space; the true belief is defined over a third 
dimension, the robot's orientation 0, which is omitted in this and the following dia- 
grams. Figure 8.6d shows the belief after the robot moved 2m, and incorporated the 
second range scan shown in Figure 8.6c. The certainty in the position estimation in- 
creases and the global maximum of the belief already corresponds to the true location 
of the robot. After integrating another scan into the belief the robot finally perceives 
the sensor scan shown in Figure 8.6e. Virtually all probability mass is now centered at 
the actual robot pose (see Figure 8.6f). Intuitively, we say that the robot successfully 
localized itself. This example illustrates that grid localization is capable to globally 
localize a robot very efficiently. A second example is shown in Figure 8.7. 


Of course, global localization usually requires more than just a few sensor scans to 
succeed. This is particularly the case in symmetric environments, and if the sensors 
are less accurate than laser sensors. Figures 8.8 to 8.10 illustrate global localization 
using a mobile robot equipped with sonar sensors only, and in an environment that 
possesses many corridors of approximately the same width. An occupancy grid map 
is shown in Figure 8.8. Figure 8.9a shows the data set, obtained by moving along one 
of the corridors and then turning into another. Each of the "beams" in Figure 8.9a cor- 
responds to a sonar measurement. In this particular environment, the walls are smooth 
and a large fraction of sonar readings are corrupted. Again, the probabilistic model 
of the sensor readings is the beam-based model described in described in Section 6.3. 
Figure 8.9 additionally shows the belief for three different points in time, marked “A,” 
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Figure 8.6 Global localization in a map using laser range-fi nder data. (a) Scan of the 
laser range-fi nders taken at the start position of the robot (max range readings are omitted). 
Figure (b) shows the situation after incorporating this laser scan, starting with the uniform 


distribution. (c) Second scan and (d) resulting belief. After integrating the fi nal scan shown 
in (e), the robot's belief is centered at its actual location (see (f)). 
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Room C 


(b) 


(d) 


(e) (f) 


Figure 8.7 Global localization in an offi ce environment using sonar data. (a) Path of the 
robot. (b) Belief as the robot passes position 1. (c) After some meters of robot motion, the 
robot knows that it is in the corridor. (d) As the robot reaches position 3 it has scanned 
the end of the corridor with its sonar sensors and hence the distribution is concentrated on 
two local maxima. While the maximum labeled I represents the true location of the robot, 
the second maximum arises due to the symmetry of the corridor (position II is rotated by 
180? relative to position I). (e) After moving through Room A, the probability of being at 
the correct position I is now higher than the probability of being at position II. (f) Finally 
the robot's belief is centered on the correct pose. 
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Figure 8.8 Occupancy grid map of the 1994 AAAI mobile robot competition arena. 


“B; and “С” in Figure 8.9a. After moving approximately three meters, during which 
the robot incorporates 5 sonar scans, the belief is spread almost uniformly along all 
corridors of approximately equal size, as shown in Figure 8.9b. A few seconds later, 
the belief is now focused on a few distinct hypotheses, as depicted in Figure 8.9c. 
Finally, as the robot turns around the corner and reaches the point marked “С” the 
sensor data is now sufficient to uniquely determine the robot’s position. The belief 
shown in Figure 8.9d is now closely centered around the actual robot pose. This ex- 
ample illustrates that the grid representation works well for high-noise sonar data and 
in symmetric environments, where multiple hypotheses have to be maintained during 
global localization. 


Figure 8.10 illustrates the ability of the grid approach to correct accumulated dead- 
reckoning errors by matching sonar data with occupancy grid maps. Figure 8.10a 
shows the raw odometry data of a 240m long trajectory. Obviously, the rotational 
error of the odometry quickly increases. After traveling only 40m, the accumulated 
error in the orientation (raw odometry) is about 50 degrees. Figure 8.10b shows the 
path of the robot estimated by the grid localizer. 


Obviously, the resolution of the discrete representation is a key parameter for grid 
Markov localization. Given sufficient computing and memory resources, fine-grained 
approaches are generally preferable over coarse-grained ones. In particular, fine- 
grained approaches are superior to coarse-grained approaches, assuming that sufficient 
computing time and memory is available. As we already discussed in Chapter 2.4.4, 
the histogram representation causes systematic error that may violate the Markov as- 
sumption in Bayes filters. The finer the resolution, the less error is introduced, and the 
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(b) 


(d) 


Figure 8.9 (а) Data set (odometry and sonar range scans) collected in the environment 
shown in Figure 8.8. This data set is suffi cient for global localization using the grid local- 
ization. The beliefs at the points marked ‘4,’ ‘B” and “С” are shown in (b) - (d). 


(b) 


Figure 8.10 (a) Odometry information and (b) corrected path of the robot. 
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1 Algorithm МСІ (2,1, 0, zi, m): 

2 4 -A4«-0 

3 for m — 1 to M do 

4 al") = sample_motion_model(u;, 21") 
5: wl = measurement model(z;, ol), m) 
6 A = Ж + (afl, wh") 

7 endfor 

8 for m = 1 to M do 

9: draw i with probability oc w!” 

10: add z” to 2, 

11: endfor 

12: return 2, 


Table 8.2 MCL, or Monte Carlo Localization, a localization algorithm based on particle 
fi Iters. 


better the results. Fine-grained approximations also tend to suffer less from “catas- 
trophic" failures where the robot's belief differs significantly from its actual position. 
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8.3.1 The MCL Algorithm 


We will now turn our attention to a popular localization algorithm which represents 
the belief bel(z,) by particles. The algorithm is called Monte Carlo Localization, or 
MCL. Like grid-based Markov localization, MCL is applicable to both local and global 
localization problems. Despite its relatively short existence, MCL has already become 
one of the most popular localization algorithms in robotics. It is easy to implement, 
and tends to work well across a broad range of localization problems. 


Table 8.2 shows the basic MCL algorithm, which is obtained by substituting the appro- 
priate probabilistic motion and perceptual models into the algorithm particle filters 
(Table 4.3 on page 78). The basic MCL algorithm represents the belief bel(z,) by a set 


of M particles A, = (all, all КЕ al, Lines 4 in our algorithm (Table 8.2) sam- 
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ples from the motion model, using particles from present belief as starting points. The 
beam measurement model is then applied in line 5 to determine the importance weight 
of that particle. The initial belief bel(xo) is obtained by randomly generating M such 
particles from the prior distribution p(xo), and assigning the uniform importance fac- 
tor MT! to each particle. As in grid localization, the functions motion. model, and 
measurement model, may be implemented by any of the motion models in Chapter 5, 
and measurement models in Chapter 6, respectively. 


Figure 8.11 illustrates MCL using the one-dimensional hallway example. The initial 
global uncertainty is achieved through a set of pose particles drawn at random and 
uniformly over the entire pose space, as shown in Figure 8.11a. As the robot senses 
the door, line 5 of in the algorithm MCL assigns importance factors to each particle. 
The resulting particle set is shown in Figure 8.11b. The height of each particle in this 
figure shows its importance weight. It is important to notice that this set of particles 
is identical to the one in Figure 8.11a—the only thing modified by the measurement 
update are the importance weights. 


Figure 8.11c shows the particle set after resampling (line 8-11 in the algorithm MCL) 
and after incorporating the robot motion (line 4). This leads to a new particle set 
with uniform importance weights, but with an increased number of particles near the 
three likely places. The new measurement assigns non-uniform importance weights 
to the particle set, as shown in Figure 8.11d. At this point, most of the cumulative 
probability mass is centered on the second door, which is also the most likely location. 
Further motion leads to another resampling step, and a step in which a new particle 
set is generated according to the motion model (Figure 8.11e). As should be obvious 
from this example, the particle sets approximate the correct posterior, as would be 
calculated by an exact Bayes filter. 


Figure 8.12 shows the result of applying MCL in an actual office environment, for 
a robot equipped with an array of sonar range finders. The figure depicts particle 
sets after 5, 28, and 55, meters of robot motion, respectively. Each particle set is 
defined over the 3-dimensional pose space, although only the x- and y-coordinates 
of each particle are shown. А second illustration is provided in Figure 8.13, here 
using a camera pointed towards the ceiling, and a measurement model that relates the 
brightness in the center of the image to a previously acquired ceiling map. 


8.3.2 Properties of MCL 


MCL can approximate almost any distribution of practical importance. It is not bound 
to a limited parametric subset of distributions, as was the case for EKF localization. 


CHAPTER 8 


202 


pim 


CUR 


Е 


p(z|x) 


p(z|x) 


Ee 


(d) 
veces pes 


ESSO === O00 


Figure 8.11 Monte Carlo Localization, a particle fi Iter applied to mobile robot localiza- 


tion. 
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Figure 8.12 Illustration of Monte Carlo localization: Shown here is a robot operating in 
an offi ce environment of size 54m x 18m. (a) After moving 5m, the robot is still highly 
uncertain about its position and the particles are spread trough major parts of the free-space. 
(b) Even as the robot reaches the upper left corner of the map, its belief is still concentrated 
around four possible locations. (c) Finally, after moving approximately 55m, the ambiguity 
is resolved and the robot knows where it is. All computation is carried out in real-time on a 
low-end PC. 
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Figure 8.13 Global localization using a camera pointed at the ceiling. 


The accuracy of the approximation is easily determined by the size of the particle set 
M. Increasing the total number of particles increases the accuracy of the approxima- 
tion. The number of particles M is a parameter that enables the user to trade off the 
accuracy of the computation and the computational resources necessary to run MCL. 
А common strategy for setting M is to keep sampling until the next pair и; and 2; has 
arrived. In this way, the implementation is adaptive with regards to the computational 
resources: the faster the underlying processor, the better the localization algorithm. 
Such a resource-adaptivity is difficult to achieve for grid localization and Gaussian 
techniques. 


A final advantage of MCL pertains to the non-parametric nature of the approximation. 
As our illustrative results suggest, MCL can represent complex multi-modal probabil- 
ity distributions, and blend them seamlessly with focused Gaussian-style distributions. 
This provides MCL with the ability to solve global localization problems with high- 
accuracy position tracking. 


8.3.3 Random Particle MCL: Recovery from 
Failures 
MCL, in its present form, solves the global localization problem but cannot recover 


from robot kidnapping, or global localization failures. This quite obvious from the 
results in Figure 8.12: As the position is acquired, particles at places other than the 
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most likely pose gradually disappear. At some point, particles only “survive” near a 
single pose, and the algorithm is unable to recover if this pose happens to be incorrect. 


This problem is significant. In practice, any stochastic algorithm such as MCL may 
accidentally discard all particles near the correct pose during the resampling step. 
This problem is particularly paramount when the number of particles is small (e.g., 
M = 50), and when the particles are spread over a large volume (e.g., during global 
localization). 


Fortunately, this problem can be solved by a rather simple heuristic. The idea of 
this heuristic is to add random particles to the particle sets. Such an "injection" of 
random particles can be justified mathematically by assuming that the robot might get 
kidnapped with a small probability, thereby generating a fraction of random states in 
the motion model. Even if the robot does not get kidnapped, however, the random 
particles add an additional level of robustness. 


The approach of adding particles raises two questions. First, how many particles 
should be added at each iteration of the algorithm and, second, from which distri- 
bution should we generate these particles? One might add a fixed number of random 
particles at each iteration. A better idea is to add particles based on some estimate of 
the localization a accuracy. One way to implement this idea is to monitor the proba- 
bility of sensor measurements 


р(& | zii, Ut, m) (8.3) 


and relate it to the average measurement probability (which is easily learned from 
data). In particle filters, an approximation to this quantity is easily obtained from the 
importance factor, since, by definition, an importance weight is a stochastic estimate 
of p(z | 2.1, ш, m). The mean 


M 
1 т 
p(zl|z-iuwm) m 5 ul ] (8.4) 


thus approximates the desired quantity. It is usually a good idea to smooth this estimate 
by averaging it over multiple time steps. There exist multiple reasons why this mean 
may be poor in addition to poor position tracking: the amount of sensor noise might 
be unnatural high, or the particles may still be spread out during a global localization 
phase. For these reasons, it is a good idea to maintain a short-term average of the 
measurement likelihood, and relate it to the long-term average when determining the 
number of random samples. 
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1 Algorithm Augmented MCL(X,. 1, uz, zt, т): 

2 static Wslow, fast 

3 А, = А, = 0 

4: for m — 1 to M do 

5: al"! = sample. motion. model(u,, 7!" ) 

6 wl” = measurement model(:z;, ol"), m) 
7 A = А, + (xl), wl!) 

8: Wavg = Wave + d; wl 

9: endfor 

10: Wslow = slow + Oslow (Wave E Wslow ) 

11: Wfast = fast + fast (Wave E Utast) 

12: for m — 1 to M do 

13: with probability max(0.0, 1.0 — wast /Wslow) do 
14: add random pose to +; 

15: else 

16: draw i € {1,..., N} with probability ox w!” 
17: add a!" to А, 

18: endwith 

19: endfor 

20: return 2, 


Table 8.3 Ап adaptive variant of MCL that adds random samples. The number of random 
samples is determined by comparing the short-term with the long-term likelihood of sensor 
measurements. 


The second problem of determining which sample distribution to use, can be addressed 
in two ways. One can draw particles according to a uniform distribution over the pose 
space, and then weight them with the current observation. For some sensor models, 
however, it is impractical to generate particles directly in accordance to the measure- 
ment distribution. One example of such a sensor model is the landmark detection 
model discussed in Chapter 6.6. In this case the additional particles can be placed di- 
rectly at locations distributed according to the observation likelihood (see Table 6.5). 


Table 8.3 shows a variant of the MCL algorithm that adds random particles. This al- 
gorithm is adaptive, in that it tracks the short-term and the long-term average of the 
likelihood p(z; | 2.1, ше, rn). Its first part is identical to the algorithm MCL in Ta- 
ble 8.2: New poses are samples from old particles using the motion model (line 5), and 
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their importance weight is set in accordance to the measurement model (line 6). How- 
ever, Augmented MCL calculates the empirical measurement likelihood in line 8, and 
maintains short-term and long-term averages of this likelihood in lines 10 and 11. The 
algorithm requires that 0 < orow K Оғаѕ+. The parameters ооу, and fast, are decay 
rates for the exponential filters that estimate the long-term, and short-term, averages, 
respectively. The crux of this algorithm can be found in line 13: During the resampling 
process, a random sample is added with probability max(0.0, 1.0— Wfast /Wslow ); oth- 
erwise, resampling proceeds in the familiar way. The probability of adding a random 
sample takes into consideration the divergence between the short- and the long-term 
average of the measurement likelihood. If the short-term likelihood is better or equal 
to the long-term likelihood, no random sample is added. However, if the short-term 
likelihood is worse than the long-term one, random samples are added in proportion 
to the quotient of these values. In this way, a sudden decay in measurement likeli- 
hood induces an increased number of random samples. The exponential smoothing 
counteracts the danger of mistaking momentary sensor noise for a poor localization 
result. 


Figure 8.15 illustrates our augmented MCL algorithm in practice. Shown there is a se- 
quence of particle sets during global localization and relocalization of a legged robot 
equipped with a color camera, and operating on a 3x2m field as it is used in RoboCup 
soccer competitions. Sensor measurements correspond to the detection and relative 
localization of six visual markers placed around the field. The algorithm described in 
Table 6.4 is used to determine the likelihood of detections. Step 14 in Figure 8.3 is re- 
placed by an algorithm for sampling according to the most recent sensor measurement, 
which is easily implemented for our feature-based sensor model. 


Figures 8.15a-d illustrate global localization. At the first marker detection, virtually 
all particles are drawn according to this detection (Figure 8.15b). This step corre- 
sponds to situation in which the short-term average of the measurement probability is 
much worse than its long-term correspondent. After several more detections, the par- 
ticles are clustered around the true robot position (Figure 8.15d), and both the short- 
and long-term average of the measurement likelihood increases. At this stage of local- 
ization, the robot is merely tracking its position, the observation likelihoods are rather 
high, and only a very small number random particles are occasionally added. 


As the robot is physically carried to a different location—a common even in robotic 
soccer tournaments—the measurement probability drops. The first marker detection 
at this new location does not yet trigger any additional particles, since the smoothed 
estimate wast is still high (see Figure 8.15e). After several marker detections observed 
at the new location, was. decreases much faster than wsjow and more random particles 
are added (Figure 8.15f,g). Finally, the robot successfully relocalizes itself as shown 
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Figure8.15 Mont Carlo localization with random particles. Each picture shows a particle 
set representing the robot's position estimate (small lines indicate the orientation of the 
particles). The large circle depicts the mean of the particles, and the true robot position is 
indicated by the small, white circle. Marker detections are illustrated by arcs centered at 
the detected marker. The pictures illustrate global localization (a)-(d) and relocalization 
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Figure 8.16 Left diagram: plain MCL (top curve), MCL with random samples (center 
curve), and MCL with mixture proposal distribution (bottom curve). The error rate is mea- 
sured in percentage of time during which the robot lost track of its position, for a data set 
acquired by a robot operating in a crowded museum. Right: Error as a function of time for 
standard MCL and mixture MCL, using a ceiling map for localization. 


in Figure 8.15h, demonstrating that our augmented MCL algorithm is indeed capable 
of "surviving" kidnapping. 


8.3.4 Modifying the Proposal Distribution 


A related limitation of MCL arises from its proposal mechanism; in fact, it inherits this 
deficiency from the vanilla particle filter. As discussed in Chapter 4.2.4, the particle 
filter uses the motion model as proposal distribution, but it seeks to approximate a 
product of this distribution and the perceptual likelihood. The larger the difference 
between the proposal and the target distribution, the more samples are needed. 


In MCL, this induces a surprising failure mode: If we were to acquire a perfect sensor 
which—without any noise—always informs the robot of its correct pose, MCL would 
fail. This is even true for noise-free sensors that do not carry sufficient information for 
localization. An example of the latter would be a 1-D noise-free range sensor: When 
receiving such a range measurement, the space of valid pose hypotheses will be a 2-D 
subspace of the 3-D pose space. We already discussed in length in Section 4.2.4 that 
chances to sample into this 2-D submanifold are zero when sampling from the robot 
motion model. Thus, we face the strange situation that under certain circumstances, a 
less accurate sensor would be preferable to a more accurate sensor when using MCL 
for localization. This is not the case for ЕКЕ localization, since the ЕКЕ update takes 
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the measurements into account when calculating the new mean—instead of generating 
mean(s) from the motion model alone. 


Luckily, a simple trick provides remedy: Simply use a measurement model that arti- 
ficially inflates the amount of noise in the sensor. One can think of this inflation as 
accommodating not just the measurement uncertainty, but also the uncertainty induced 
by the approximate nature of the particle filter algorithm. 


An alternative, more sound solution involves a modification of the sampling process 
which we already discussed briefly in Section 4.2.4. The idea is that for a small frac- 


tion of all particles, the role of the motion model and the measurement model are 
reversed: Particles are generated according to the measurement model 


al") р(а |а) (8.5) 


and the importance weight is calculated in proportion to 


vw" = / pal”! | ше, же 1 Беа 1) dia (8.6) 


This new sampling process is a legitimate alternative to the plain particle filter. It 
alone will be inefficient since it entire ignores the history when generating particles. 
However, it is equally legitimate to generate a fraction of the particles with either of 
those two mechanisms and merge the two particle sets. The resulting algorithm is 
called MCL with mixture distribution, or mixture MCL. In practice, it tends to suffice 
to generate a small fraction of particles (e.g., 596) through the new process. 


Unfortunately, our ides does not come without challenges. The two main steps— 
sampling from p(z; | x+) and calculating the importance weights wl". сап be dif- 
ficult to realize. Sampling from the measurement model is only easy if its inverse 
possesses a closed form solution from which it is easy to sample. This is usually not 
the case: imagine sampling from the space of all poses that fit a given laser range scan! 
Calculating the importance weights is complicated by the integral in (8.6), and by the 


fact that bel(z,..1) is itself represented by a set of particles. 


Without delving into too much detail, we note that both steps can be implemented, 
but only with additional approximations. Figure 8.16 shows comparative results for 
MCL, MCL augmented with random samples, and mixture MCL for two real-world 
data sets. In both cases, p(z; | x+) was itself learned from data and represented by 
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a density tree—an elaborate procedure whose description is beyond the scope of this 
book. For calculating the importance weights, the integral was replaced by a stochastic 
integration, and the prior belief was continued into a space-filling density by convolv- 
ing each particle with a narrow Gaussian. Details aside, these results illustrate that the 
mixture idea yields superior results, but it can be challenging to implement. 


We also note that the mixture MCL provides a sound solution to the kidnapped robot 
problem. By seed-starting particles using the most recent measurement only, we con- 
stantly generate particles at locations that are plausible given the momentary sensor 
input, regardless of past measurements and controls. There exist ample evidence in 
the literature that such approaches can cope well with total localization failure (Fig- 
ure 8.16b happens to show one such failure for regular MCL), hence provides im- 
proved robustness in practical implementations. 


8.4 LOCALIZATION IN DYNAMIC 
ENVIRONMENTS 


A key limitation of all localization algorithms discussed thus far arises from the static 
world assumption, or Markov assumption. Most interesting environments are popu- 
lated by people, and hence exhibit dynamics not modeled by the state x;. To some 
extent, probabilistic approaches are robust to such unmodeled dynamics, due to their 
ability to accommodate sensor noise. However, as previously noted, the type sen- 
sor noise accommodated in the probabilistic filtering framework must be independent 
at each time step, whereas unmodeled dynamics induce effects on the sensor mea- 
surements over multiple time steps. When such effects are paramount, probabilistic 
localization algorithms that rely on the static world assumption may fail. 


A good example of such a failure situation is shown in Figure 8.17. This example in- 
volves a mobile tour-guide robot, navigating in museums full of people. The people— 
their locations, velocities, intentions etc.—are hidden state relative to the localization 
algorithm which is not captured in the algorithms discussed thus far. Why is this prob- 
lematic? Imagine people lining up in a way that suggests the robot is facing a wall. As 
a result, with each single sensor measurement the robot increases its belief of being 
next to a wall. Since information is treated as independent, the robot will ultimately 
assign high likelihood to poses near walls. Such an effect is possible with independent 
sensor noise, but its likelihood is vanishingly small. 


There exist two fundamental techniques for dealing with dynamic environments. The 
first technique includes the hidden state into the state estimated by the filter; the 
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Figure 8.17 Scenes from the Deutsches Museum Bonn, where the mobile robot “Rhino” 
was frequently surrounded by dozens of people. 


other preprocesses sensor measurements to eliminate measurements affected by hid- 
den state. The former methodology is mathematically the more general one: Instead 
of just estimating the robot's pose, one can define filter that also estimates people's 
positions, their velocities, etc. In fact, we will later on discuss such an approach, as an 
extension to a mobile robot mapping algorithm. 


The principle disadvantage of estimating the hidden state variables lies in its compu- 
tational complexity: Instead of estimating 3 variables, the robot must now calculate 
posteriors over a much larger number of variables. In fact, the number of variables 
itself is a variable, as the number of people may vary over time. Thus, the resulting al- 
gorithm will be substantially more involved than the localization algorithms discussed 
thus far. 


The alternative—preprocessing sensor data—works well in certain limited situations, 
which includes situations where people’s presence may affect range finders or (to a 
lesser extent) camera images. Here we develop it for the beam-based range finder 
model from Chapter 6.3. 


The idea is to investigate the cause of a sensor measurement, and to reject those likely 
to be affected by unmodeled environment dynamics. The sensor models discussed thus 
far all address different, alternative ways by which a measurement can come into exis- 
tence. If we manage to associate specific ways with the presence of unwanted dynamic 
effects—such as people—all we have to do it to discard those measurements that are 
with high likelihood caused by such an unmodeled entity. This idea is surprisingly 
general. In Equation(6.13), Chapter 6.3, we defined the beam-based measurement 
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Figure 8.18 Laser range scans are often heavily corrupted when people surround the 
robot. How can a robot maintain accurate localization under such circumstances? 


model for range finders as a mixture of four terms: 


T 
Zhit Dhit GE | Tt, m) 
k Zshort Dshort (21 | z,,m) 
z | t,m) = . 8.7 
p( ў | е Zmax Diac 27 | 2+,т) ( ) 
тапа Prana (2Ë | Tt, m) 


As our derivation of the model clearly states, One of those terms, the one involv- 
ing Zshort апа Pshort, corresponds to unexpected objects. To calculate the probabil- 
ity that a measurement z7 corresponds to an unexpected object, we have to intro- 
duce a new correspondence variable, c? which can take on one of the four values 
{hit, short, max, rand}. 


The posterior probability that the range measurement z7 corresponds to a “short” 
reading—our mnemonic from Chapter 6.3 for unexpected obstacle—is then obtained 
by applying Bayes rule and subsequently dropping irrelevant conditioning variables: 


p(cf = short | 2, 214-1, Ur mM) 


p(z£ | cf = short, z141, u14, m) p(ct = short | zi4—3,u14,m) 


Уу v(t | ё = с, z12-1, ure m) p(& = c | ziii; uias m) 
с 
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p(z | e = short, Z]1:t—15 U1:t; m) p(ct = short) 


X par | e) = c, ziii arm) p(er = с) 
с 


(8.8) 


Here the variable c in the denominator takes on any of the four values 
{hit, short, max, rand}. Using the notation in Equation (8.7), the prior p(cf = с) 
is given by the variables Zhit, Zshort; 2max» and Zrana, for the four different values of 
c. The remaining probability in (8.8) is obtained by integrating out the pose z+: 


p(z? | e = с, 21:01, Urt M) 
= Ке | Te, CY = с, 21:4—1, 01:1, m) p(zi | e = с, 21:4—1, Чы, m) ах, 
x [oe | Te, CY = с, т) plz | Pic toe M) ах, 


= [oe | 71, CF = c, m) bel(z;) da, (8.9) 


Probabilities of the form p(z7 | x+, c = с, т) were abbreviated as Phit, Pshorts Pmax» 
and Prana in Chapter 6.3. This gives us the expression for desired probability (8.8): 


ПЕЕ, | ж, тт) Zshort bel(z,) dz, 


[Dect | т, m) Ze bel(x,) da, 
(8.10) 


p(z} E short | oe Z1:—15 Чи m) 


In general, these integrals do not possess closed-form solutions. To evaluate them, 
it suffices to approximate them with a representative sample of the posterior bel(z;) 
over the state x+. Those samples might be high-likelihood grid cells in grid localizer, 
or particles in a MCL algorithm. The measurement is then rejected if its probability 
of being caused by an unexpected obstacle exceeds a user-selected threshold x. 


Table 8.4 depicts an implementation of this technique in the context of particle filters. 
It requires as input a particle set 2% representative of the belief bel(z;), along with a 
range measurement z7 and a map. It returns “reject” if with probability larger than x 
the measurement corresponds to an unexpected object; otherwise it returns "accept." 
This routine precedes the measurement integration step in MCL. 
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1 Algorithm test range measurement(:7 , A, m): 

2 p=q=0 

3 form = 1 to M do 

4 P= p + nit pnit(2® | 21", т) 

5: q4 = 9 т Zhit " Puis (27 | 21" ; m) T Zshort * Pshort (2f | ud , m) 
6 +Zmax ' Duas | 219 ; m) T 2гапа ' Beandlat | 21" ; m) 
7 endfor 

8 ifp/q € x then 

9: return accept 

10: else 

11 return reject 

12 endif 


Table8.4 Algorithm for testing range measurements in dynamic environment. 


Figure 8.19 illustrates the effect of the filter. Shown in both panels are a range scan, for 
a different alignment of the robot pose. The lightly shaded scans are above threshold 
and rejected. A key property of our rejection mechanism is that it tends to filter out 
measurements that are “surprisingly” short, but leaves others in place that are “sur- 
prisingly" long. This asymmetry reflects the fact that people's presence tends to cause 
shorter-than-expected measurements. By accepting surprisingly long measurements, 
the approach maintains its ability to recover from global localization failures. 


Figure 8.20 depicts an episode during which a robot navigates through an environment 
that densely populated with people (see Figure 8.18). Shown there is the robot's esti- 
mated path along with the endpoints of all scans incorporated into the localizer. This 
figure shows the effectiveness of removing measurements that do not correspond to 
physical objects in the map: there are very few "surviving" range measurements in 
the freespace for the right diagram, in which measurements are accepted only if they 
surpass the threshold test. 


As a rule of thumb, filtering out measurements is generally a good idea.There exist al- 
most no static environments; even in office environments furniture is moved, doors are 
opened/closed, etc. Our specific implementation here benefits from the asymmetry of 
range measurements: people make measurements shorter, not longer. When applying 
the same idea to other data (e.g., vision data) or other types of environment modifi- 
cations (e.g., the removal of a physical obstacle), such an asymmetry might not exist. 
Nevertheless, the same probabilistic analysis is usually applicable. The disadvantage 
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(a) (b) 


Figure 8.19 Illustration of our measurement rejection algorithm: Shown in both diagrams 
are range scans (no max-range readings). Lightly shaded readings are fi Itered out. 


of the lack of such a symmetry might be that it becomes impossible to recover from 
global localization failures, as every surprising measurement is rejected. In such cases, 
it may make sense to impose additional constraints, such as a limit on the fraction of 
measurements that may be corrupted. 


We note that our rejection test has found successful application even in highly static 
environments, for reasons that are quite subtle. The beam-based sensor model is dis- 
continuous: Small changes of pose can drastically alter the posterior probability of a 
sensor measurement. This is because the result of ray casting is not a continuous func- 
tion in pose parameters such as the robot orientation. In environment with cluttered 
objects, this discontinuity increases the number of particle necessary for successful lo- 
calization. By manually removing clutter from the map—and instead letting the filter 
manage the resulting "surprising" short measurements—the number of particles can 
be reduced drastically. The same strategy does not apply to the likelihood field model, 
since this model is smooth in the pose parameters. 


85 PRACTICAL CONSIDERATIONS 


Table 8.5 summarizes the main localization techniques discussed in this and the previ- 
ous chapter. When choosing a technique, a number of requirements have to be traded 
off. A first question will always be whether it is preferable to extract features from 
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(а) 


аре: ы зы жылай final position 


ли 


Distance at final position: 19 cm 
Certainty at final position: 0.003 


(b) 


Distance at final position: 1 cm 
Certainty at final position: 0.998 


Figure 8.20 Comparison of (a) standard MCL and (b) MCL with the removal of sensor 
measurements likely caused by unexpected obstacles. Both diagrams show the robot path 
and the end-points of the scans used for localization. 


sensor measurement. Extracting features may be beneficial from a computational per- 
spective, but it comes at the price of reduced accuracy and robustness. 


EKF MHT Coarse (topo- | fine (metric) | MCL 
logical) grid grid 
Measurements landmarks landmarks landmarks raw raw 
measurements | measurements 
Measurement Gaussian Gaussian any any any 
noise 
Posterior Gaussian mixture of | histogram histogram particles 
Gaussians 
Effi ciency ++ ++ + — + 
(memory) 
Effi ciency (time) ++ + + — + 
Ease of | + — + — ++ 
implementation 
Resolution ++ ++ — + + 
Robustness — T + ++ ++ 
Global no no yes yes yes 
localization 


Table 8.5 Comparison of different implementations of Markov localization. 
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While in this chapter, we developed techniques for handling dynamic environments 
in the context of the MCL algorithm, similar ideas can be brought to bear with other 
localization techniques as well. In fact, the techniques discussed here are only repre- 
sentative of a much richer body of approaches. 


When implementing a localization algorithm, it is worthwhile to play with the various 
parameter settings. For example, the conditional probabilities are often inflated when 
integrating nearby measurements, so as to accommodate unmodeled dependencies that 
always exist in robotics. А good strategy is to collect reference data sets, and tune the 
algorithm until the overall result is satisfactory. This is necessary because no mat- 
ter how sophisticated the mathematical model, there will always remain unmodeled 
dependencies and sources of systematic noise that affect the overall result. 


8.6 SUMMARY 


In this chapter, we discussed two families of probabilistic localization algorithms, grid 
techniques and Monte Carlo localization (MCL). 


= Спа techniques represent posteriors through histograms. 


m Тһе coarseness of the grid trades off accuracy and computational efficiency. For 
course grids, it is usually necessary to adjust the sensor and motion models to 
account for effects that arise from the coarseness of the representation. For fine 
grids, it may be necessary to update grid cells selectively to reduce the overall 
computation. 


= The Monte Carlo localization algorithm represents the posterior using particles. 
The accuracy-computational costs trade-off is achieved through the size of the 
particle set. 


m Both grid localization and MCL can globally localize robots. 
= Ву adding random particles, MCL also solves the kidnapped robot problem. 


=  Mixture-MCL is an extension which inverts the particle generation process for a 
fraction of all particles. This yields improved performance specifically for robots 
with low-noise sensors, but at the expense of a more complex implementation. 


=  Unmodeled environment dynamics can be accommodated by filtering sensor 
data, rejecting those that with high likelihood correspond to an unmodeled ob- 
ject. When using range sensors, the robot tends to reject measurements that are 
surprisingly short. 
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8.7 EXERCISES 


Derive the additive form of the multi-feature information integration in lines 14 and 
15 in Table 7.2. 
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9 


OCCUPANCY GRID MAPPING 


9.1 INTRODUCTION 


The previous two chapters discussed the application of probabilistic techniques to a 
low-dimensional perceptual problem, that of estimating a robot's pose. We assumed 
that the robot was given a map in advance. This assumption is legitimate in quite a 
few real-world applications, as maps are often available a priori or can be constructed 
by hand. Some application domains, however, do not provide the luxury of coming 
with an a priori map. Surprisingly enough, most buildings do not comply with the 
blueprints generated by their architects. And even if blueprints were accurate, they 
would not contain furniture and other items that, from a robot's perspective, determine 
the shape of the environment just as much as walls and doors. Being able to learn a 
map from scratch can greatly reduce the efforts involved in installing a mobile robot, 
and enable robots to adapt to changes without human supervision. In fact, mapping is 
one of the core competencies of truly autonomous robots. 


Acquiring maps with mobile robots is a challenging problem for a number of reasons: 


m Тһе hypothesis space, that is the space of all possible maps, is huge. Since maps 
are defined over a continuous space, the space of all maps has infinitely many 
dimensions. Even under discrete approximations, such as the grid approximation 
which shall be used in this chapter, maps can easily be described 10? or more 
variables. The sheer size of this high-dimensional space makes it challenging 
to calculate full posteriors over maps; hence, the Bayes filtering approach that 
worked well for localization is inapplicable to the problem of learning maps, at 
least in its naive form discussed thus far. 
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= Learning maps is a “chicken-and-egg” problem, for which reason is often re- 
ferred to as the simultaneous localization and mapping (SLAM) or concurrent 
mapping and localization problem problem. When the robot moves through its 
environment, it accumulates errors in odometry, making it gradually less certain 
as to where it is. Methods exist for determining the robot's pose when a map 
is available, as we have seen in the previous chapter. Likewise, constructing a 
map when the robot's poses are known is also relatively easy—a claim that will 
be substantiated by this chapter and subsequent chapters. In the absence of both 
an initial map and exact pose information, however, the robot has to do both: 
estimating the map and localizing itself relative to this map. 


Of course, not all mapping problems are equally hard. The hardness of the mapping 
problem is the result of a collection of factors, the most important of which are: 


m Size. The larger the environment relative to the robot's perceptual range, the 
more difficult it is to acquire a map. 


m Noise in perception and actuation. If robot sensors and actuators were noise- 
free, mapping would be a simple problem. The larger the noise, the more difficult 
the problem. 


= Perceptual ambiguity. The more frequently different places look alike, the more 
difficult it is to establish correspondence between different locations traversed at 
different points in time. 


= Cycles. Cycles in the environment are particularly difficult to map. If a robot just 
goes up and down a corridor, it can correct odometry errors incrementally when 
coming back. Cycles make robots return via different paths, and when closing 
the cycle the accumulated odometric error can be huge! 


To fully appreciate the difficulty of the mapping problem, consider Figure 9.1. Shown 
there is a data set, collected in a large indoor environment. Figure 9.1a was generated 
using the robot's raw odometry information. Each black dot in this figure corresponds 
to an obstacle detected by the robot's laser range finder. Figure 9.1b shows the result 
of applying mapping algorithms to this data, including the techniques described in this 
chapter. This example gives a good flavor of problem at stake. 


In this chapter, we first study the mapping problem under the restrictive assumption 
that the robot poses are known. Put differently, we side-step the hardness of the SLAM 
problem by assuming some oracle informs us of the exact robot path during mapping. 
We will discuss a popular family of algorithms, collectively called occupancy grids. 
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Figure 9.1 (а) Raw range data, position indexed by odometry. (b) Map 


Occupancy grid maps address the problem of generating consistent maps from noisy 
and uncertain measurement data, under the assumption that the robot pose is known. 
The basic idea of the occupancy grids is to represent the map as a field of random 
variables, arranged in an evenly spaced grid. Each random variable is binary and 
corresponds to the occupancy of the location is covers. Occupancy grid mapping 
algorithms implement approximate posterior estimation for those random variables. 


The reader may wonder about the significance of a mapping technique that requires 
exact pose information. After all, no robot's odometry is perfect! The main utility of 
the occupancy grid technique is in post-processing: Many of the SLAM techniques 
discussed in subsequent chapters do not generate maps fit for path planning and navi- 
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gation. Occupancy grid maps are often used after solving the SLAM problem by some 
other means, and taking the resulting path estimates for granted. 


9.2 THE OCCUPANCY GRID MAPPING 
ALGORITHM 


To gold standard of any occupancy grid mapping algorithm is to calculate the posterior 
over maps given the data 


p(m | 21:4, 21:4) (9.1) 


As usual, m is the map, 21. the set of all measurements up to time t, and 71.4 is the 
path of the robot, that is, the sequence of all its poses. The controls u;.; play no role 
in occupancy grid maps, since the path is already known. Hence, they will be omitted 
throughout this chapter. 


The types maps considered by occupancy grid maps are fine-grained grids defined over 
the continuous space of locations. By far the most common domain of occupancy grid 
maps are 2-D floorplan maps, which describe a 2-D slice of the 3-D world. 2-D maps 
are often sufficient, especially when a robot navigates on a flat surface and the sensors 
are mounted so that they capture only a slice of the world. Occupancy grid techniques 
generalize to 3-D representations, but at significant computational expenses. 


Let m; denote the grid cell with index т. An occupancy grid map partitions the space 
into finitely many grid cells: 


m = Ут, (9.2) 


Each mj has attached to it a binary occupancy value, which specifies whether a cell 
is occupied or free. We will write “1” for occupied and “0” for free. The notation 
p(m; = 1) or p(m,) will refer to a probability that a grid cell is occupied. 


The problem with the posterior (9.1) is its dimensionality: the number of grid cells in 
maps like the one shown in Figure 9.1 are in the tens of thousands, hence the number 
of maps defined in this space is often in the excess of 210000, Calculating a posterior 
for each single such map is therefore intractable. 
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The standard occupancy grid approach breaks down the problem of estimating the map 
into a collection of separate problems, namely that of estimating 


p(mi | 21:4, 21:4) (9.3) 


for all grid cell m;. Each of these estimation problems is now a binary problem with 
static state. This decomposition is convenient but not without problems. In particular, 
it does enable us to represent dependencies among neighboring cells; instead, the 
posterior over maps is approximated as the product of its marginals: 


p(m | Pigs) = p(m; | Pity Die) (9.4) 


We will return to this issue in Section 9.4 below, when we discuss more advanced 
mapping algorithms. For now, we will adopt this factorization for convenience. 


Thanks to our factorization, the estimation of the occupancy probability for each grid 
cell is now a binary estimation problem with static state. A filter for this problem 
was already discussed in Chapter 4.1.4, with the corresponding algorithm depicted in 
Table 4.2 on page 75. The algorithm in Table 9.1 applies this filter to the occupancy 
grid mapping problem. As in the original filter, our occupancy grid mapping algorithm 
uses the log-odds representation of occupancy: 


(m; | Zl: 214) 


p(mi | Zl: 21) 


li = dgl- (9.5) 


This representation is already familiar from Chapter 4.1.4. The advantage of the log- 
odds over the probability representation is that we can avoid numerical instabilities for 
probabilities near zero or one. The probabilities are easily recovered from the log-odds 
ratio: 


1 


1 — ———-—— 9.6 
1+ ехр{1,;} um) 


p(m; | 21:2; 214) 


The algorithm оссирапсу grid. mapping in Table 9.1 loops through all grid cells i, 
and updates those that fall into the sensor cone of the measurement z;. For those where 
it does, it updates the occupancy value by virtue of the function inverse sensor model 
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1: Algorithm occupancy grid mapping((l 1 ;), £t, 24): 

2: for all cells m; do 

3: if m; in perceptual field of z, then 

4: lii = l-1, + inverse sensor model(mi, £+, zt) — lo 
5: else 

6: leg = ha 

T: endif 

8: endfor 

9: 


return {4} 


Table 9.1 The occupancy grid algorithm, a version of the binary Bayes fi lter in Table 4.2. 


in line 4 of the algorithm. Otherwise, the occupancy value remains unchanged, as 
indicated in line 6. The constant lọ is the prior of occupancy represented as a log-odds 
ratio: 


(m; = 1) 


p p(m;) 
lg = 1 РЕР кела 
£ n p(m; = 0) 


= log Lun) (9.7) 


The function inverse sensor model implements the inverse measurement model 
p(mi | zi, x+) in its log-odds form: 


inverse sensor model(m;, 21, zt) = p(m; | 21,21) (9.8) 


A somewhat simplistic example of such a function for range finders is given in Ta- 
ble 9.2 and illustrated in Figure 9.7a&b. This model assigns to all cells within the 
sensor cone whose range is close to the measured range an occupancy value of locc. In 
Table 9.2, the width of this region is controlled by the parameter o, and the opening 
angle of the beam is given by 5. 


The algorithm occupancy. grid mapping calculates the inverse model by first deter- 
mining the beam index Ё and the range r for the center-of-mass of the cell m;. This 
calculation is carried out in lines 2 through 5 in Table 9.2. As usual, we assume that 
the robot pose is given by x; = (x у 0)7. In line 7, it returns the prior for occupancy 
in log-odds form whenever the cell is outside the measurement range of this sensor 
beam, or if it lies more than a/2 behind the detected range gr In line 9, it returns 
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(a) ЕЕН (b) 


Figure 9.2 Two examples of our inverse 
measurement model inverse. range.sensor model for two different measurement ranges. 
The darkness of each grid cell corresponds to the likelihood of occupancy. 


loce > lo is the range of the cell is within ta/2 of the detected range z5: It returns 
кее < lo if the range to the cell is shorter than the measured range by more than a/2. 
The left and center panel of Figure 9.7 illustrates this calculation for a sonar beam 
with a 15? opening angle. 


Figures ?? shows an example map next to a blueprint of a large open exhibit hall, and 
relates it to the occupancy map acquired by a robot. The map was generated using 
laser range data gathered in a few minutes' time. The grey-level in the occupancy map 
indicates the posterior of occupancy over an evenly spaced grid: The darker a grid cell, 
the more likely it is occupied. While occupancy maps are inherently probabilistic, they 
tend to quickly converge to estimates that are close to the two extreme posteriors, 1 and 
0. In comparison between the learned map and the blueprint, the occupancy grid map 
shows all major structural elements, and obstacles as they were visible at the height 
of the laser. Close inspection alleviates discrepancies between the blueprint and the 
actual environment configuration. 


Figure 9.4 compares a raw dataset with the occupancy grid maps generated from this 
data. The data in Panel (a) was preprocessed by a SLAM algorithm, so that the poses 
align. Some of the data is corrupted by the presence of people; the occupancy grid map 
filters out people quite nicely. This makes occupancy grid maps much better suited 
for robot navigation than sets of scan endpoint data: A planner fed the raw sensor 
endpoints would have a hard time finding a path through such scattered obstacles, 
even if the evidence that the corresponding cell is free outweigh that of occupancy. 
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(a) 


(b) 


Figure9.3 (а) Occupancy grid map and (b) architectural blue-print of a large open exhibit 
space. Notice that the blue-print is inaccurate in certain places. 


Occupancy Grid Mapping 229 


Figure 9.4 (а) Raw laser range data with correct(ed) pose information. Each dot corre- 
sponds to a detection of an obstacle. Most obstacles are static (walls etc.), but some were 
people that walked in the vicinity of the robot. (b) Occupancy grid map built from the data. 
The grey-scale indicates the posterior probability: Black corresponds to occupied with high 
certainty, and white to free with high certainty. The grey background color represents the 
prior. Figure (a) has been generated by Steffen Gutmann. 
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1 Algorithm inverse range sensor model(;, x+, z+): 
2 Let x;, y; be the center-of-mass of m; 

3 т = (zi - £)? + (y - y)? 

4: ф = atan2(y; — y, zi — 2) — 0 

5: k — argmin; | — 05 sens| 

6: ifr > min(zmax, zë + @/2) or |ф — Ox sens| > 8/2 then 
7 return lo 

8 if zE < zmax and |r — Zmax| < 0/2 

9: return сс 

10: ifr < 2Ё 

11: return жее 

12: endif 


Table 9.2 A simple inverse measurement model for robots equipped with range fi nders 
Here a is the thickness of obstacles, and 8 the width of a sensor beam. The values locc and 
lfree in lines 9 and 11 denote the amount of evidence a reading carries for the two difference 
cases. 


We note that our algorithm makes occupancy decisions exclusively based on sensor 
measurements. An alternative source of information is the space claimed by the robot 
itself: When the robot’s pose is x+, the region surrounding x; must be navigable. Our 
inverse measurement algorithm in Table 9.2 can easily be modified to incorporate this 
information, by returning a large negative number for all grid cells occupied by a 
robot when at x+. In practice, it is a good idea to incorporate the robot's volume when 
generating maps, especially if the environment is populated during mapping. 


9.21  Multi-Sensor Fusion 


Robots are often equipped with more than one type of sensor. Hence, a natural ob- 
jective is to integrate information from more than one sensors into a single map. This 
question as to how to best integrate data from multiple sensors is particularly inter- 
esting if the sensors have different characteristics. For example, Figure 9.5 shows 
occupancy maps built with a stereo vision system, in which disparities are projected 
onto the plane and convolved with a Gaussian. Clearly, the characteristics of stereo are 
different from that of a sonar-based range finder, in that that are sensitive to different 
types obstacles. 
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(a) 


Figure 9.5 Estimation of occupancy maps using stereo vision: (a) camera image, (b) 
sparse disparity map, (c) occupancy map by projecting the disparity image onto the 2-D 
plane and convolving the result with a Gaussian. 


There are two basic approaches for fusing data from multiple sensors is to 
use Bayes filters for sensor integration. We can execute the algorithm occu- 
pancy.grid mapping in Table 9.1 with different sensor modalities, replacing the func- 
tion inverse sensor model accordingly. However, such an approach has a clear draw- 
back. If different sensors detect different types of obstacles, the result of Bayes filter- 
ing is ill-defined. Consider, for example, consider an obstacle that can be recognized 
by sensor A but not by sensor B. Then these two sensors will generate conflicting 
information, and the resulting map will depend on the amount of evidence brought 
by every sensor system. This is generally undesirable, since whether or not a cell is 
considered occupied depends on the relative frequency at which different sensors are 
polled. 


The second, more appropriate approach to integrating information from multiple sen- 
sors is to build separate maps for each sensor types, and integrate them using the most 
conservative estimate. Let m* = (m^) denote the map built by the k-th sensor type. 
Then the combined map is given by 


m, = maxmf (9.9) 


This map is the most pessimistic map given its components: If any of the sensor- 
specific map shows that a grid cell is occupied, so will the combined map. While this 
combined estimator is biased in factor of occupied maps, it is more appropriate than 
the Bayes filter approach when sensors with different characteristics are fused. 
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9.3 LEARNING INVERSE 
MEASUREMENT MODELS 


9.3.1 Inverting the Measurement Model 


The occupancy grid mapping algorithm requires a marginalized inverse measurement 
model, p(mi; | x, 2). This probability is called “inverse” since it reasons from effects 
to causes: it provides information about the world conditioned on a measurement 
caused by this world. It is marginalized for the 7-th grid cell; a full inverse would be of 
the type p(m | x, 2). In our exposition of the basic algorithm, we already provided an 
ad hoc procedure in Table 9.2 for implementing such an inverse model. This raises the 
question as to whether we can obtain an inverse model in a more principled manner, 
starting at the conventional measurement model. 


The answer is positive but less straightforward than one might assume at first glance. 
Bayes rule suggests 


p(m|z,z) = р(2 | са zu |2) (9.10) 


= np(z|x,m) р(т) (9.11) 


Here we silently assume p(m | x) = p(m), that is, the pose of the robot tells us 
nothing about the map—an assumption that we will adopt for sheer convenience. If 
our goal was to calculate the inverse model for the entire map at-a-time, we would now 
be done. However, our occupancy grid mapping algorithm approximates the posterior 
over maps by its marginals, one for each grid cell m;. The inverse model for the 7-th 
grid cell is obtained by selecting the marginal for the i-th grid cell: 


рш |) =m 1а) nlm) dm (9.12) 


This expression integrates over all maps m for which the occupancy value of grid cell 
i equals m;. Clearly, this integral cannot be computed, since the space of all maps is 
too large. We will now describe an algorithm for approximating this expression. The 
algorithm involves generating samples from the measurement model, and approximat- 
ing the inverse using a function approximator (or supervised learning algorithm), such 
as a polynomial, logistic regression, or a neural network. 
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9.3.2 Sampling from the Forward Model 


The basic idea is simple and quite universal: If we can generate random triplets of 
poses 21, measurements 210) апа map occupancy values m”! for any grid cell m;, 
we can learn a function that accepts a pose x and measurement z as an input, and 


outputs the probability of occupancy for m;. 


[К] 


A sample of the form (218! 210 m; )can be generated by the following procedure. 


1. Sample a random map ml" ~ p(m). For example, one might already have 
a database of maps that represents p(m) and randomly draws a map from the 
database. 


2. Sample a pose 210) inside the map. One may safely assume that poses are uni- 


formly distributed. 


3. Sample an measurement ZH ~ plz | 210) , тій). This sampling step is reminis- 


cent of a robot simulator which stochastically simulates a sensor measurement. 


4. Extract the desired “true” occupancy value m; for the target grid cell from the 
map m. 


The result is a sampled pose z , a measurement 214 , and the occupancy value the the 


grid cell m;. Repeated application of this sampling step yields a data set 


S; д — oce(m,)!4 
тү! z^ э. occ(m,)"l 


ya (9.13) 


tio za э осс(ш; 


These triplets may serve as training examples for function approximator, to approxi- 
mate the desired conditional probability p(m; | 2, 2). Here the measurements z and 
the pose x are input variables, and the occupancy value occ(m,) is a target for the 
output of the function approximator. 


This approach is somewhat inefficient, since it fails to exploit a number of properties 
that we know to be the case for the inverse sensor model. 


= Measurements should carry no information about grid cells far outside their per- 
ceptual range. This observation has two implications: First, we can focus our 
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sample generation process on triplets where the cell m; is indeed inside the mea- 
surement cone. And second, when making a prediction for this cell, we only have 
to include a subset of the data in a measurement z (e.g., nearby beams) as input 
to the function approximator. 


m The characteristics of a sensor are invariant with respect to the absolute coordi- 
nates of the robot or the grid cell when taking a measurement. Only the relative 
coordinates matter. If we denote the robot pose by x; = (a y 0)7 and the coor- 
dinates of the grid cell by m; = (zm, Ym; )”, the coordinates of the grid cell are 
mapped into the robot's local reference frame via the following translation and 
rotation: 


cosÜ —siné Zm,— T 
( sinÜ | cos0 ) ( Um; — V ) oe 


In robots with circular array of range finders, it makes sense to encode the relative 
location of a grid cell using the familiar polar coordinates (range and angle). 


m Nearby grid cells should have a similar interpretation under the inverse sensor 
model. This smoothness suggest that it may be beneficial to learn a single func- 
tion in which the coordinates of the grid cell function as an input, rather than 
learning a separate function for each grid cell. 


m [f the robot possesses functionally identical sensors, the inverse sensor model 
should be interchangeable for different sensors. For robots equipped with a cir- 
cular array of range sensors, any of the resulting sensor beam is characterized by 
the same inverse sensor model. 


The most basic way to enforce these invariances is to constrain the function approxi- 
mator by choosing appropriate input variables. A good choice is to use relative pose 
information, so that the function approximator cannot base its decision on absolute co- 
ordinates. It is also a good idea to omit sensor measurements known to be irrelevant to 
occupancy predictions, and to confine the prediction to grid cells inside the perceptual 
field of a sensor. By exploiting these invariances, the training set size can be reduced 
significantly. 


9.3.3 The Error Function 


To train the function approximator, we need an approximate error function. A pop- 
ular example are artificial neural networks trained with the Back-propagation algo- 
rithm. Back-propagation trains neural networks by gradient descent in parameter 
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space. Given an error function that measures the mismatch" between the network's 
actual and desired output, Back-propagation calculates the first derivative of the tar- 
get function and the parameters of the neural network, and then adapts the parameters 
in opposite direction of the gradient so as to diminish the mismatch. This raises the 
question as to what error function to use. 


A common approach is to train the function approximator so as to maximize the log- 
likelihood of the training data. More specifically we are given a training set of the 
form 


input! >  occ(m,)!!! 
input?) — occ(m,)2! 


input] —>  occ(m;)") (9.15) 


occ(mi) [К] is the i-th sample of the desired conditional probability, and input [К] is the 


corresponding input to the function approximator. Clearly, the exact form of the input 
may vary as a result of the encoding known invariances, but the exact nature of this 
vector will plays no role in the form of the error function. 

Let us denote the parameters of the function approximator by W. Assuming that each 


individual item in the training data has been generated independently, the likelihood 
of the training data is now 


Г р! | input), w) (9.16) 
and its logarithm is 
ДИ) = Mlogp(mP | input], И) (9.17) 


Here J defines the function we seek to maximize during training. 


Let us denote the function approximator by f (input!) W^). The output of this func- 
tion is a value in the interval [0; 1]. After training, we want the function approximator 
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output the probability of occupancy, that is: 


А [k] : [К] = 
f(input!™, W) if us = (9.18) 


[k] |. [К] = 
m; | input", W) = 
ки ар ) | 1— f(nput, W) if m; 


Thus, we seek an error function that adjusts W so as to minimize the deviation of this 
predicted probability and the one communicated by the training example. To find such 
an error function, we re-write (9.18) as follows: 


pm! | inputl, W) = f (input, wm? (1— f(nputlsl, w) 
(9.19) 


It is easy to see that this product and Expression (9.18) are identical. In the product, 
one of the terms is always 1, since its exponent is zero. Substituting the product 
into (9.20) and multiplying the result by minus one gives us the following function: 


JW) = – Y log [барии wy" (1 — fünputl*l, w)” 
= o у; m ló A [k] _ m [k] КОГ [А] 
= i log f (input! , W) + (1 — m; )log(1 — f (input, W)) 


(9.20) 


J (W) is the error function to minimize when training the function approximator. It is 
easily folded into any function approximator that uses gradient descent. 


9.3.4 Further Considerations 


Figure 9.6 shows the result of an artificial neural network trained to mimic the inverse 
sensor model. The robot in this example is equipped with a circular array of sonar 
range sensors mounted at approximate table height. The input to the network are rela- 
tive coordinates in polar coordinates, and the set of five adjacent range measurements. 
The output is a probability of occupancy: the darker a cell, the more likely it is occu- 
pied. As this example illustrates, the approach correctly learns to distinguish freespace 
from occupied space. The gray-ish area behind obstacles matches the prior probability 
of occupancy, which leads to no change when used in the occupancy grid mapping al- 
gorithm. Figure 9.6b contains a faulty short reading on the bottom left. Here a single 
reading seems to be insufficient to predict an obstacle with high probability. 
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(b) 


Figure9.6 Sensor interpretation: Three sample sonar scans (top row) and local occupancy 
maps (bottom row), as generated by the neural network. Bright regions indicate free-space, 
and dark regions indicate walls and obstacles (enlarged by a robot diameter). 


We note that there exists a number of ways to train a function approximator using ac- 
tual data collected by a robot, instead the simulated data from the forward model. In 
general, this is the most accurate data one can use for learning, since the measurement 
model is necessarily just an approximation. One such way involves a robot operating 
in a known environment with a known map. With Markov localization, we can lo- 
calize the robot, and then use its actual recorded measurements and the known map 
occupancy to assemble training examples. It is even possible to start with an approx- 
imate map, use the learned sensor model to generate a better map, and from that map 
use the procedure just outlined to improve the inverse measurement model. As this 
book is being written, the issue of learning inverse sensor models from data remains 
relatively unexplored. 
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Figure 9.7 The problem with the standard occupancy grid mapping algorithm in Sec- 
tion 9.2: For the environment shown in Figure (a), a passing robot might receive the (noise- 
free) measurement shown in (b). The factorial approach maps these beams into probabilistic 
maps separately for each grid cell and each beam, as shown in (c) and (d). Combining both 
interpretations yields the map shown in (e). Obviously, there is a confict in the overlap 
region, indicated by the circles in (e). The interesting insight is: There exist maps, such as 
the one in diagram (f), which perfectly explain the sensor measurement without any such 
confict. For a sensor reading to be explained, it suffi ces to assume an obstacle somewhere 
in the cone of a measurement, and not everywhere. 


9.4 MAXIMUM A POSTERIOR 
OCCUPANCY MAPPING 


9.4.1 The Case for Maintaining Dependencies 


In the remainder of this chapter, we will return to one of the very basic assumptions of 
the occupancy grid mapping algorithm. In Section 9.2, we assumed that we can safely 
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1: Algorithm MAP occupancy grid mapping(z;.:, 21:4): 
2i set m = {0} 
3: repeat until convergence 
4: for all cells m; do 
5: m; = argmax(lo)" + 5 log 

k=0,1 7 

measurement model(z;, ж, m with m; = k) 

6: endfor 
75 endrepeat 
8: return m 


Table 9.3 The maximum a posteriori occupancy grid algorithm, which uses conventional 
measurement models instead of inverse models. 


decompose the map inference problem defined over high-dimensional space of all 
maps, into a collection of single-cell mapping problems. This assumption culminated 
into the factorization in (9.4). This raises the question as to how faithful we should be 
in the result of any algorithm that relies on such a strong decomposition. 


Figure 9.7 illustrates a problem that arises directly as a result of this factorization. 
Shown there is a situation in which the the robot facing a wall receives two noise-free 
sonar range measurements. Because the factored approach predicts an object along the 
entire arc at the measured range, the occupancy values of all grid cells along this arc are 
increased. When combining the two different measurements shown in Figure 9.7c&d, 
a conflict is created, as shown in Figure 9.7e. The standard occupancy grid mapping 
algorithm "resolves" this conflict by summing up positive and negative evidence for 
occupancy; however, the result will reflect the relative frequencies of the two types of 
measurements, which is undesirable. 


However, there exist maps, such as the one in Figure 9.7f, which perfectly explains the 
sensor measurements without any such conflict. This is because for a sensor reading 
to be explained, it suffices to assume an obstacle somewhere in its measurement cone. 
Put differently, the fact that cones sweep over multiple grid cells induces important 
dependencies between neighboring grid cells. When decomposing the mapping into 
thousands of individual grid cell estimation problems, we lose the ability to consider 
these dependencies. 
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9.4.2 Occupancy Grid Mapping with Forward 
Models 


These dependencies are incorporated by an algorithm that outputs the mode of the 
posterior, instead of the full posterior. The mode is defined as the maximum of the 
logarithm of the map posterior, which we already encountered in Equation (9.1): 


* 


m* = argmaxlog p(m | 21-4, 21:4) (9.21) 


The map posterior factors into a map prior and a measurement likelihood (c.f., Equa- 
tion (9.11)): 


log p(m | zit, £t) = — logp(zia | zi: m) + log p(m) (9.22) 


The log-likelihood log p(z1:¢ | £1:+, m) decomposes into a sum all individual mea- 
surement log-likelihoods log p(z; | £+, m). Further, the log-prior log p(m) is simply a 
sum of the type 


logp(m) =  [logp(m;)]"* + [log(1 — p(m;))] 


1 


= М108(1 – p(m;)) + 5 (lo) (9.23) 


1 


where M denotes the number of grid cells, and lọ is adopted from (9.7). The term 
M log(1 — p(m;)) is obviously independent of the map. Hence it suffices to optimize 
the remaining expression and the data log-likelihood: 


т^ = argmax У log p(z | zem) + EC (9.24) 
"t t i 


A hill-climbing algorithm for maximizing this log-probability is provided in Table 9.3. 
This algorithm starts with the all-free map (line 2). It “flips” the occupancy value 
of a grid cell when such a flip increases the likelihood of the data (lines 4-6). For 
this algorithm it is essential that the prior of occupancy p(m;) is not too close to 1; 
otherwise it might return an all-occupied map. As any hill climbing algorithm, this 
approach is only guaranteed to find a local maximum. In practice, there are usually 
very few, if any, local maxima. 
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Figure 9.8 (a) sonar range measurements from a noise-free simulation; (b) Results of 
the standard occupancy mapper, lacking the open door. (c) A maximum a posterior map. 
(d) The residual uncertainty in this map, obtained by measuring the sensitivity of the map 
likelihood function with respect to individual grid cells.. 


Figure 9.8 illustrates the effect of the MAP occupancy grid algorithm. Figure 9.8a 
depicts a noise-free data set of a robot passing by an open door. Some of the sonar 
measurements detect the open door, while others are reflected at the door post. The 
standard occupancy mapping algorithm with inverse models fails to capture the open- 
ing, as shown in Figure 9.8b. The mode of the posterior is shown in Figure 9.8c. 
This map models the open door correctly, hence it is better suited for robot naviga- 
tion than the standard occupancy grid map algorithm. Figure 9.8d shows the residual 
uncertainty of this map. This diagram is the result of a cell-wise sensitivity analysis: 
The magnitude by which flipping a grid cell decreases the log-likelihood function is 
illustrated by the grayness of a cell. This diagram, similar in appearance to the regular 
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occupancy grid map, suggests maximum uncertainty for grid cells behind obstacles. It 
lacks the vertical stripes found in Figure 9.8a. 


There exists a 
number of limitations of the algorithm MAP occupancy grid mapping, and it can 
be improved in multiple ways. The algorithm is a maximum a posterior approach, and 
as such returns no notion of uncertainty in the residual map. Our sensitivity analysis 
approximates this uncertainty, but this approximation is overconfident, since sensitiv- 
ity analysis only inspects the mode locally. Further, the algorithm is a batch algorithm 
and cannot be executed incrementally. In fact, the MAP algorithm requires that all data 
is kept in memory. At the computational end, the algorithm can be sped up by initial- 
izing it with the result of the regular occupancy grid mapping approach, instead of an 
empty map. Finally. we note that only a small number of measurements are affected 
by flipping a grid cell in Line 5 of Table 9.3. While each sum is potentially huge, 
only a small number of elements has to be inspected when calculating the argmax. We 
leave the design of an appropriate data a structure to the reader as an exercise. 


9.5 SUMMARY 


This chapter introduced algorithms for learning occupancy grids. All algorithms in 
this chapter require exact pose estimates for the robot, hence they do not solve the 
general mapping problem. 


m Тһе standard occupancy mapping algorithm estimates for each grid cell individu- 
ally the posterior probability of occupancy. It is an adaptation of the binary Bayes 
filter for static environments. 


= Data from multiple sensors can be fused into a single map in two ways: By 
maintaining a single map using Bayes filters, and by maintaining multiple maps, 
one for each sensor modality, and extracting the most most pessimistic occupancy 
value when making navigation decisions. The latter procedure is preferable when 
different sensors are sensitive to different types of obstacles. 


m Тһе standard occupancy grid mapping algorithm relies on inverse measurement 
models, which reason from effects (measurements) to causes (occupancy). This 
differs from previous applications of Bayes filters in the context of localization, 
where the Bayes filter was based on a conventional measurement model that rea- 
sons from causes to effects. 
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m [15 possible to learn inverse sensor models from the conventional measurement 
model, which models the sensor from causes to effects. To do so, one has to 
generate samples, and learn an inverse model using function approximation. 


m  Thestandard occupancy grid mapping algorithm does not maintain dependencies 
in the estimate of occupancy. This is a result of decomposing the map poste- 
rior estimation problem into a large number of single-cell posterior estimation 
problem. 


m The full map posterior is generally not computable, due to the large number of 
maps that can be defined over a grid. However, it can be maximized. Maximizing 
it leads to maps that are more consistent with the data. However, the maximiza- 
tion requires the availability of all data, and the resulting maximum a posterior 
map does not capture the residual uncertainty in the map. 


Without a doubt, occupancy grid maps and their various extensions are vastly popular 
in robotics. This is because they are extremely easy to acquire, and they capture 
important elements for robot navigation. 
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10 


SIMULTANEOUS LOCALIZATION 
AND MAPPING 


10.1 INTRODUCTION 


This and the following chapters address one of the most fundamental problems in 
robotics, the simultaneous localization and mapping problem. 'This problem is com- 
monly abbreviated as SLAM, and is also known as Concurrent Mapping and Local- 
ization, or CML. SLAM problems arise when the robot does not have access to a 
map of the environment; nor does it have access to its own poses. Instead, all it is 
given are measurements 21.; and controls u1:¢. The term “simultaneous localization 
and mapping" describes the resulting problem: In SLAM, the robot acquires a map 
of its environment while simultaneously localizing itself relative to this map. SLAM 
is significantly more difficult than all robotics problems discussed thus far: It is more 
difficult than localization in that the map is unknown and has to be estimated along the 
way. Itis more difficult than mapping with known poses, since the poses are unknown 
and have to be estimated along the way. 


From a probabilistic perspective, there are two main forms of the SLAM problem, 
which are both of equal practical importance. One is known as the online SLAM 
problem: It involves estimating the posterior over the momentary pose along with the 
map: 


P(t m | z14, u14) (10.1) 


Here x; is the pose at time t, m is the map, and 21. and ит. are the measurements 
and controls, respectively. This problem is called the online SLAM problem since it 
only involves the estimation of variables that persist at time t. Many algorithms for the 
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online SLAM problem are incremental: they discard past measurements and controls 
once they have been processed. 


The second SLAM problem is called the full SLAM problem. In full SLAM, we seek 
to calculate a posterior over the entire path ©. along with the map, instead of just the 
current pose тү: 


р(21:0,т | zia, uia) (10.2) 


This subtle difference in the formulation of the SLAM problem between online and 
full SLAM has ramifications in the type algorithms that can be brought to bear. In 
particular, the online SLAM problem is the result of integrating out past poses from 
the full SLAM problem: 


plz, m | 21: и Ча) (10.3) 
- ff: f (212, M | zy, u1:2) dx, ало... dx4_-1 


In online SLAM, these integrations are typically performed one-at-a-time, and they 
cause interesting changes of the dependency structures in SLAM that we will fully 
explore in the next chapter. 


A second key characteristic of the SLAM problem has to do with the nature of the 
estimation problem. SLAM problems possess a continuous and a discrete component. 
The continuous estimation problem pertains to the location of the objects in the map 
and the robot’s own pose variables. Objects may be landmarks in feature-based rep- 
resentation, or they might be object patches detected by range finders. The discrete 
nature has to do with correspondence: When an object is detected, a SLAM algorithm 
must reason about the relation of this object to previously detected objects. This rea- 
soning is typically discrete: Either the object is the same as a previously detected one, 
or it is not. 


We already encountered similar continuous-discrete estimation problems in previous 
chapters. For example, EKF localization 7.5 estimates the robot pose, which is con- 
tinuous, but to do so it also estimates the correspondences of measurements and land- 
marks in the map, which are discrete. In this and the subsequent chapters, we will 
discuss a number of different techniques to deal with the continuous and the discrete 
aspects of the SLAM problem. 
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At times, it will be useful to make the correspondence variables explicit, as we did in 
Chapter 7 on localization. The online SLAM posterior is then given by 


р(21,т, с 23:5, Ча) (10.4) 


and the full SLAM posterior by 


p(zia, m, ea | z14, 14) (10.5) 


The online posterior is obtained from the full posterior by integrating out past robot 
poses and summing over all past correspondences: 


p(zi, m, Ct | Zt, Us) (10.6) 
= үү B JSS PS S priam Zt, Чаа) dz лә... dz, 4 
су C2 Ct—1 


In both versions of the SLAM problems—online and full—estimating the full pos- 
terior (10.4) or (10.5) is the gold standard of SLAM. The full posterior captures all 
there is to be known about the map and the pose or the path. In practice, calculat- 
ing a full posterior is usually infeasible. Problems arise from two sources: (1) the 
high dimensionality of the continuous parameter space, and (2) the large number of 
discrete correspondence variables. Many state-of-the-art SLAM algorithm construct 
maps with tens of thousands of features, or more. Even under known correspondence, 
the posterior over those maps alone involves probability distributions over spaces with 
10° or more dimensions. This is in stark contrast to localization problems, in which 
posteriors were estimated over three-dimensional continuous spaces. Further, in most 
applications the correspondences are unknown. The number of possible assignment to 
the vector of all correspondence variables, c;., grows exponentially in the time t. Thus, 
practical SLAM algorithms that can cope with the correspondence problem must rely 
on approximations. 


The SLAM problem will be discussed in a number of subsequent chapter. The remain- 
der of this chapter develops an EKF algorithm for the online SLAM problem. Much of 
this material builds on Chapter 3.3, where the EKF was introduced, and Chapter 7.5, 
where we applied the EKF to the mobile robot localization problem. We will derive a 
progression of EKF algorithms that first apply EKFs to SLAM with known correspon- 
dences, and then progress to the more general case with unknown correspondences. 
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CHAPTER 10 


2 SLAM WITH EXTENDED KALMAN 
FILTERS 


2.1 Setup and Assumptions 


Historically the earliest, and perhaps the most influential SLAM algorithm is based on 
the extended Kalman filter, or EKF. In a nutshell, the EKF SLAM algorithm applies 
the EKF to online SLAM using maximum likelihood data association. In doing so, 
EKF SLAM is subject to a number of approximations and limiting assumptions: 


10. 


Feature-based maps. Maps, in the EKF, are composed of point landmarks. 
For computational reasons, the number of point landmarks is usually small (e.g., 
smaller than 1,000). Further, the EKF approach tends to work well the less am- 
biguous the landmarks are. For this reason, EKF SLAM requires significant en- 
gineering of feature detectors, sometimes using artificial beacons or landmarks 
as features. 


Gaussian noise. As any EKF algorithm, EKF SLAM makes a Gaussian noise 
assumption for the robot motion and the perception. The amount of uncertainty 
in the posterior must be relatively small, since otherwise the linearization in EKFs 
tend to introduce intolerable errors. 


Positive measurements. The EKF SLAM algorithm, just like the EKF localizer 
discussed in Chapter 7.5, can only process positive sightings of landmarks. It 
cannot process negative information that arises from the absence of landmarks 
in a sensor measurements. This is a direct consequence of the Gaussian belief 
representation and was already discussed in Chapter 7.5. 


2.2 SLAM with Known Correspondence 


The SLAM algorithm for the case with known correspondence addresses the contin- 
uous portion of the SLAM problem only. Its development is in many ways parallel 
to the derivation of the EKF localization algorithm in Chapter 7.5, but with one key 
difference: In addition to estimating the robot pose x+, the ЕКЕ SLAM algorithm also 
estimates the coordinates of all landmarks encountered along the way. This makes it 
necessary to include the landmark coordinates into the state vector. 
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t2 


15: 


16: 


17: 
18: 
19: 
20: 
21: 


Algorithm EKF SLAM known. correspondences(, 1, 341, Ut, Zt, Ct): 
1 0 0 0...0 


0 1 0 0...0 
Роуз 
0.0 1 0...0 
Каре 
2N 
— 26 віп ре1,0 + Zt sin(Hr-1,6 + Аё) 
fit = ш-1 + FI 2t cos Me-1,0 — Zt Сов(ре-1,0 + weAt) 
VAN 
0 0 E COS [1t 1,0 — "m cos(ut—1,9 + co A) 
Ge=I+Fy | 0 0 sini — 2 sin(uicio + oA) Е, 
0 0 0 


Se = б 1 GP + FT Ry Е, 


ог 0 0 
Qi = 0 оф 0 
0 0 o; 


for all observed features z; = (ri фі si)? do 
ј= а 
if landmark j never seen before 


lj [ae .f( cos(bi + lino) 
By | =| Bey | +ri | sin(d: + hee) 
Hj,s S; 0 


endif 
PES Ôx = lj, — Pt,x 
ôy Hj,y — Шу 
q=0 5 
| va 
#1 = | atan2(dy,62) — fit,o 
Hj,s 
1 0 0 0...0 0 0 0 0...0 
0 1 0 0...0 0 0 0 0...0 
0.0 1 0...0 0 0 0 0...0 
Кыз=| 0 0 0 0...0 1 0 0 0...0 
0 0 0 0...0 0 1 0 0...0 
0 0 0 0...0 0 0 1 0...0 
— “oO 
2j-2 2N-2j 
| Vi: -yay 0 Vg Vg, 0 
Hi = б, Ox —1 =, =ô; 0 | Fr; 
mM _0 o 0 0 0 0 1 
ki =} Hi? (H$ 5, HİT + Qu)! 
endfor 


ш = fie +X., Ki (z — 2) 
X. = (1 – EXT. Hi) 
return Ht, Ut 


Table 10.1 The extended Kalman fi Iter (EKF) algorithm for the simultaneous localization 
and mapping problem, formulated here for a feature-based map and a robot equipped with 
sensors for measuring range and bearing. This version assumes knowledge of the exact 
correspondences. 
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For convenience, let us call the state vector comprising robot pose and the map the 
combined state vector, and denote this vector y;. The combined vector is given by 


“= (2) (10.7) 


T 
(cx yO mis; ту $1 Mee may S2 ... Туа MNy SN ) 


Here т, y, and 0 denote the robot's coordinates at time t, Mi x, Mi у are the coordi- 
nates of the 7-th landmark, for i = 1,..., N, and s; is its signature. The dimension 
of this state vector is 3N + 3, where N denotes the number of landmarks in the map. 
Clearly, for any reasonable number of N, this vector is significantly larger than the 
pose vector that is being estimated in Chapter 7.5, which introduced the EKF localiza- 
tion algorithm. EKF SLAM calculates the online posterior 


py: | 2, U1:t) (10.8) 


The ЕКЕ SLAM algorithm is depicted in Table 10.1—notice the similarity to the 
EKF localization algorithm in Table 7.2. Lines 2 through 5 apply the motion up- 
date, whereas Lines 6 through 20 incorporate the measurement vector. In particular, 
Lines 3 and 5 manipulate the mean and covariance of the belief in accordance to the 
motion model. This manipulation only affects those elements of the belief distribution 
concerned with the robot pose. It leaves all mean and covariance variables for the map 
unchanged, along with the pose-map covariances. Lines 7 through 18 iterate through 
all measurements. The test in Line 9 returns true only for landmarks for which we 
have no initial location estimate. For those, Line 10 initializes the location of such a 
landmark by the projected location obtained from the corresponding range and bearing 
measurement. As we shall discuss below, this step is important for the linearization 
in EKFs; it would not be needed in linear Kalman filters. For each measurement, an 
"expected" measurement is computed in Line 14, and the corresponding Kalman gain 
is computed in Line 17. Notice that the Kalman gain is a matrix of size 3 by 3N + 3. 
This matrix is usually non-sparse, that is, information is propagated through the en- 
tire state estimate. The final filter update then occurs in Lines 19 and 20, where the 
innovation is folded back into the robot's belief. 


The fact that the Kalman gain is fully populated for all state variables—and not just the 
observed landmark and the robot pose—is important. In SLAM, observing a landmark 
does not just improve the position estimate of this very landmark, but that of other 
landmarks as well. This effect is mediated by the robot pose: Observing a landmark 
improves the robot pose estimate, and as a result it eliminates some of the uncertainty 
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Figure 10.1 ЕКЕ applied to the online SLAM problem. The robot's path is a dotted line, 
and its estimations of its own position are shaded ellipses. Eight distinguishable landmarks 
of unknown location are shown as small dots, and their location estimations are shown as 
white ellipses. In (a)-(c) the robot's positional uncertainty is increasing, as is its uncertainty 
about the landmarks it encounters. In (d) the robot senses the fi rst landmark again, and the 
uncertainty of all landmarks decreases, as does the uncertainty of its current pose. 


of landmarks previously seen by the same robot. The amazing effect here is that we do 
not have to model past poses explicitly—which would put us into the realm of the full 
SLAM problem and make the EKF a non-realtime algorithm. Instead, this dependence 
is captured in the Gaussian posterior, more specifically, in the off-diagonal covariance 
elements of the matrix 5. 


252 CHAPTER 10 


Figure 10.1 illustrates the EKF SLAM algorithm for an artificial example. The robot 
navigates from a start pose which serves as the origin of its coordinate system. As 
it moves, its own pose uncertainty increases, as indicated by uncertainty ellipses of 
growing diameter. It also senses nearby landmarks and maps them with an uncertainty 
that combines the fixed measurement uncertainty with the increasing pose uncertainty. 
As a result, the uncertainty in the landmark locations grows over time. In fact, it par- 
allels that of the pose uncertainty at the time a landmark is observed. The interesting 
transition happens in Figure 10.1d: Here the robot observes the landmark it saw in the 
very beginning of mapping, and whose location is relatively well known. Through this 
observation, the robot’s pose error is reduced, as indicated in Figure 10. 1d—notice the 
very small error ellipse for the final robot pose! Further, this observation also reduces 
the uncertainty for other landmarks in the map! This phenomenon arises from a corre- 
lation that is expressed in the covariance matrix of the Gaussian posterior. Since most 
of the uncertainty in earlier landmarks is causes by the robot pose, and this very un- 
certainty persists over time, the location estimates of those landmarks are correlated. 
When gaining information on the robot's pose, this information spreads to previously 
observed landmarks. This effect is probably the most important characteristic of the 
SLAM posterior: Information that helps localize the robot is propagated through map, 
and as a result improves the localization of other landmarks in the map. 


10.2.3 Mathematical Derivation 


The derivation of the EKF SLAM algorithm for the case with known correspondences 
largely parallels that of the EKF localizer in Chapter 7.5. The key difference is the 
augmented state vector, which now includes the locations of all landmarks in addition 
to the robot pose. 


In SLAM, the initial pose is taken to be to origin of the coordinate system. This 
definition is somewhat arbitrary, in that it can be replaced by any coordinate. None of 
the landmark locations are known initially. The following initial mean and covariance 
express this belief: 


ш = (000... 0)? (10.9) 
0 0 0 oo -:- oco 
0 0 0 oo со 
0 0 0 oo :: со 

Mo = Өс» Бб tib. Т шыр eei (10.10) 
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The covariance matrix is of size (ЗЛ +3) x (3N +3). It is composed of a small 3 x З 
matrix of zeros for the robot pose variables. АП other covariance values are infinite. 


As the robot moves, the state vector changes according to the standard noise-free 
velocity model (see Equations (5.13) and (7.4)). In SLAM, this motion model is ex- 
tended to the augmented state vector: 


—$ sin 0+ 2 sin(0 + w,At) 
or cos — Et соѕ(0 + wu, At) 
PAN + yAt 
Ye = Wit 0 (10.11) 


The variables x, y, and 0 denote the robot pose in y, ,. Because the motion only 
affects the robot's pose and all landmarks remain where they are, only the first three 
elements in the update are non-zero. This enables us to write the same equation more 
compactly: 


— 0 sind + Zt sin(0 + w,At) 
Y ee Fr 2 cos 0 — 2 cos(0 + ux At) (10.12) 


шАї + ^g At 


Here F is a matrix that maps the 3-dimensional state vector into a vector of dimension 
3N +3. 


100 0...0 
0.1 0 0...0 
fa = 10.1 
i 001 0-0 (10.13) 
Sn 
3N columns 
The full motion model with noise is then as follows 
— 2t sind + 2* sin(0 + wi At) 
Ye = Ww-ic ЕТ ve cos 6 — = cos(6 + v At) +N (0, ЕТ В, Fy) 
wAt 
ч 
9(и,0:-1) 


(10.14) 
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where F T К, Е, extends the covariance matrix to the dimension of the full state vector 
squared. 


As usual in EKFs, the motion function g is approximated using a first degree Taylor 
expansion 


g(u yii) © glut, pui) + Ge (uii = mia) (10.15) 


where the Jacobian С, = g' (ut, ш) is the derivative of g at u+, as in Equation (7.8). 
Obviously, the additive form in (10.14) enables us to decompose this Jacobian into 
an identity matrix of dimension (3.N + 3) х (3N + 3) (the derivative of y+—1) plus a 
low-dimensional Jacobian 9; that characterizes the change of the robot pose: 


G = I+F? g Е, (10.16) 
with 
0 0 Æ cospi, — 2 со8(и+—1,ө + wAt) 
б = 0 D sin ye ie с sin(pi—1,0 + 4 At) (10.17) 
0 0 0 


Plugging these approximations into the standard EKF algorithm gives us Lines 2 
through 5 of Table 10.1. Obviously, several of the matrices multiplied in line 5 are 
sparse, which should be exploited when implementing this algorithm. The result of 
this update are the mean ji; and the covariance X, of the estimate at time t after up- 
dating the filter with the control u+, but before integrating the measurement z+. 


The derivation of the measurement update is similar to the one in Section 7.5. In 
particular, we are given the following measurement model 


| V (ту — £)? + (ту, — у)? c, 0 0 
Ae Ie atan2(mj, —y,mj, —z)—0 | +Л(0, | 0 оь 0 (10.18) 
Mj s 0 0 Os 
h(yt,J) Qi 


where т, y, and 0 denotes the pose of the robot, 7 is the index of an individual landmark 
observation in 2;, and j = с; is the index of the observed landmark at time t. This 
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expression is approximated by the linear function 
һи.) & hej) + Hi и) (10.19) 


Here Н} is the derivative of h with respect to the full state vector уу. Since h depends 
only on two elements of that state vector, the robot pose x+ and the location of the j-th 
landmark m;, the derivative factors into a low-dimensional Jacobian ^ and a matrix 
Fz j, which maps h; into a matrix of the dimension of the full state vector: 


He = ҺЕ (10.20) 


Here hi is the Jacobian of the function A(y;, j) at fiz, calculated with respect to the 
state variables 2; and mj: 


TIL a — my Ut — у 0 na Mj a Hity — Ut 0 
_ vd, qd, у, _ МЧ, 
i Шу — Yt Tja — Ш, Ut Шу Ш — Mj,x 
ho = 1 0 
qt qt qt qt 
0 0 0 0 0 1 
(10.21) 


The scalar qi = (Mj, — jua)? + (mj, — Ht,y), and as before, j = сї is the landmark 
that corresponds to the measurement z}. The matrix Fx ; is of dimension (3N +3) x 5. 
It maps the low-dimensional matrix ^? into a matrix of dimension (3N + 3) x 3: 


(10.22) 
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These expressions make up for the gist of the Kalman gain calculation in Lines 8 
through 17 in our EKF SLAM algorithm in Table 10.1, with one important extension. 
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When a landmark is observed for the first time, its initial pose estimate in Equation 
(10.9) leads to a poor linearization. This is because with the default initialization in 
(10.9), the point about which h is being linearized is (fij, Йуу jtj,s)’ = (0 0 0)7, 
which is a poor estimator of the actual landmark location. A better landmark es- 
timator is given in Line 10 Table 10.1. Here we initialize the landmark estimate 
(js fy fij). with the expected position. This expected position is derived from 
the expected robot pose and the measurement variables for this landmark 


[jx ШУ | сов(ф; c Дө) 
jy = Шу | +77 | sin(ót + ito) (10.23) 
hijs st 0 


We note that this initialization is only possible because the measurement function 
is bijective. Measurements are two-dimensional, as are landmark locations. In cases 
where a measurement is of lower dimensionality than the coordinates of a landmark, 
h is a true projection and it is impossible to calculate a meaningful expectation for 
(Hj By me from a single measurement only. This is, for example, the case in 
computer vision implementations of SLAM, since cameras often calculate the angle 
to a landmark but not the range. SLAM is then usually performed by integrating mul- 
tiple sightings to and apply triangulation to determine an appropriate initial location 
estimate. 


Finally, we note that ће EKF algorithm requires memory that is quadratic іп N, the 
number of landmarks in the map. Its update time is also quadratic in N. The quadratic 
update complexity stems from the matrix multiplications that take place at various 
locations in the EKF. 


10.3 EKF SLAM WITH UNKNOWN 
CORRESPONDENCES 


10.3.1 The General ЕКЕ SLAM Algorithm 


The EKF SLAM algorithm with known correspondences is now extended into the 
general EKF SLAM algorithm, which uses an incremental maximum likelihood (ML) 
estimator to determine correspondences. Table 10.2 depicts the algorithm. The input 
to this algorithm now lacks a correspondence variable с;. Instead, it includes the 
momentary size of the map, №1. 


Simultaneous Localization and Mapping 257 


1: Algorithm ЕКЕ SLAM(j;—:, 5:1, Ut, Zt, Nt—1): 
2: №, = Nia 
100 0...0 
3: r= 1 0 23 
0.0 1 0...0 
— Ф sinja-i,0 + St sin(ur-1,0 + wA) 
4 fit = ш-1 + Е, Z COS ше-1,0 — È СО8(и+—1,ө + w At) 
wrAt 
0 0 m COS 411,0 — ue cos(ut—1,9 + c A) 
5: G,—I-FZ | 0 0 Zsinprie — 2Zsin(pciooxAt) | Е, 
0 0 0 
6: Se = С. 501 СТ + Fe Re Fe 
ог 0 0 
7: a= 0 o 0 | 
0 0 Os 
8: for all observed features 2{ = (ri фі si)T do 
BN ШУ (f cos(ót + Hue) 
9: | [EE | = | Pty | z ѕіп(ф + Due) | 
DN ,s 8} 0 
10: fork = 1 to N;4-1 do 
11: бь = ee ) = ( Ege Шш ) 
k,y Hk,y — Ht,y 
12: Як = б}, бь 
ук 
13: #8 = | абап2(0, у, бк) — Due | 
Ек, 
100 0-0 000 0...0 
0 1.0 0-0 0 0 0 0...0 
0.0 1 0.0 0 0 0 0...0 
14: doce 
ооо 0.0 1 0 0 0...0 
ооо 0.0 O 1 0 0...0 
ооо 0.0 0 0 1 0...0 
x " М, —\/Ч„б 0 —\/Ч„бкь ү, 
15: A; = ak беу Ok a —1 —Ók,y —6k,x 
0 0 0 0 0 
16: W, = Af Ў, [Hr] + Qi 
17: тк = (21 — E). Wet (zi 20) 
18: endfor 
19: TNe+1 = А 
20: j(i) = argmin лк 
k 
21: Nt = max{ №, 7(%)} 
22: Ki = Se HY we 
23: endfor 
24: pm = B+ Y, Ki (44— 49) 
25: X= (1-5, Ki Hi 
26: return Ht, Ut 


0 
0 Fk 
1 


Table 10.2 The EKF SLAM algorithm with ML correspondences, shown here with outlier 
rejection. 
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The motion update in Lines 3 through 6 is equivalent to the one in 
EKF SLAM known correspondences in Table 10.1. The measurement update loop, 
however, is different. Starting in Line 8, it first creates the hypothesis of a new land- 
mark with index N; + 1; this index is one larger than the landmarks in the map at this 
point in time. The new landmark's location is initialized in Line 9, by calculating its 
expected location given the estimate of the robot pose and the range and bearing in the 
measurement. Line 9 also assigns the observed signature value to this new landmark. 
Next, various update quantities are then computed in Lines 10 through 18 for all Л, 4-1 
possible landmarks, including the “new” landmark. Line 19 sets the threshold for the 
creation of a new landmark: A new landmark is created if the Mahalanobis distance 
to all existing landmarks in the map exceeds the value o. The ML correspondence is 
then selected in Line 20. If the measurement is associated with a previously unseen 
landmark, the landmark counter is incremented in Line 21, and various vectors and 
matrices are enlarged accordingly—this somewhat tedious step is not made explicit 
in Table 10.2. The update of the EKF finally takes place in Lines 24 and 25. The 
algorithm EKF. SLAM returns the new number of landmarks N; along with the mean 
Ht and the covariance >. 


The derivation of this EKF SLAM follows directly from previous derivations. 
In particular, the initialization in Line 9 is identical to the one in Line 10 in 
EKF_SLAM_known_correspondences, Table 10.1. Lines 10 through 18 parallel 
Lines 12 through 17 іп EKF_SLAM known correspondences, with the added vari- 
able ть needed for calculating the ML correspondence. The selection of the ML cor- 
respondence in Line 20, and the definition of the Mahalanobis distance in line 17, is 
analogous to the ML correspondence discussed in Chapter 7.6; in particular, the al- 
gorithm EKF localization in Table 7.3 on page 175 used an analogous equation do 
determine the most likely landmark. (Line 14). The measurement updates in Lines 
24 and 25 of Table 10.2 are also analogous to those in the EKF algorithm with known 
correspondences, assuming that the participating vectors and matrices are of the ap- 
propriate dimension in case the map has just been extended. 


Our example implementation of EKF SLAM can be made more efficient by restrict- 
ing the landmarks considered in Lines 10 through 18 to those that are near the robot. 
Further, many of the values and matrices calculated in this inner loop can safely be 
cached away when looping through more than one feature measurement vector zł. in 
practice, a good management of features in the map and a tight optimization of this 
loop can greatly reduce the running speed. 
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Figure 10.2 ЕКЕ SLAM with known data association in a simulated environment. The 
map is shown on the left, with the gray-level corresponding to the uncertainty of each land- 
mark. The matrix on the right is the correlation matrix, which is the normalized covariance 
matrix of the posterior estimate. After some time, all x- and all y-coordinate estimates 
become fully correlated. 
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Estimated Path of the Vehicle 
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Figure 10.3 Example of Kalman fi Iter estimation of the map and the vehicle pose. Image 
courtesy of Stefan Williams and Hugh Durrant-Whyte from the Australian Centre for Field 
Robotics, University of Sydney, Australia. 


10.3.2 Examples 


Figure 10.2 shows the EKF SLAM algorithm—here with known correspondence— 
applied in simulation. The left panel of each of the three diagrams plots the posterior 
distributions, marginalized for the individual landmarks and the robot pose. The right 
side depicts the correlation matrix for the augmented state vector y;; the correlation 
is the normalized covariance. As is easily seen from the result in Figure 10.2c, over 
time all z- and y-coordinate estimates become fully correlated. This means the map 
becomes known in relative terms, up to an uncertain global location that cannot be 
reconciled. This highlights an important characteristic of the SLAM problem: The 
absolute coordinates of a map relative to the coordinate system defined by the initial 
robot pose can only be determined in approximation, whereas the relative coordinates 
can be determined asymptotically with certainty. 


In practice, EKF SLAM has been applied successfully to a large range of navigation 
problems, involving airborne, underwater, indoor, and various other vehicles. Fig- 
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Figure 10.4 Underwater vehicle Oberon, developed at the University of Sydney. Image 
courtesy of Stefan Williams and Hugh Durrant-Whyte from the Australian Centre for Field 
Robotics, University of Sydney, Australia. 


ure 10.3 shows an example result obtained using the underwater robot Oberon, devel- 
oped at the University of Sydney, Australia, and shown in Figure 10.4. This vehicle 
is equipped with a pencil sonar, a sonar that can scan at very high resolutions an de- 
tect obstacles up to 50 meters away. To facilitate the mapping problem, researchers 
have deposited long, small vertical objects in the water, which can be extracted from 
the sonar scans with relative ease. In this specific experiment, there is a row of such 
objects, spaced approximately 10 meters apart. In addition, a more distant cliff offers 
additional point features that can be detected using the pencil sonar. In the experiment 
shown in Figure 10.3, the robot moves by these landmarks, then turns around and 
moves back. While doing so, it measures and integrates detected landmarks into its 
map. using the Kalman filter approach described in this chapter. 


The map shown in Figure 10.3 shows the robot's path, marked by the triangles con- 
nected by a line. Around each triangle one can see an ellipse, which corresponds to 
the covariance matrix of the Kalman filter estimate projected into the robot's z-y posi- 
tion. The ellipse shows the variance; the larger it is, the less certain the robot is about 
its current pose. Various small dots in Figure 10.3 show landmark sightings, obtained 
by searching the sonar scan for small and highly reflective objects. The majority of 
these sightings is rejected, using a mechanism described further below, in the section 
that follows. However, some are believed to correspond to a landmark and added to 
the map. At the end of the run, the robot has classified 14 such objects as landmarks, 
each of which is plotted with the projected uncertainty ellipse in Figure 10.3. These 
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landmarks include the artificial landmarks put out by the researchers, but they also 
include various other terrain features in the vicinity of the robot. The residual pose 
uncertainty is small. In this example, the robot successfully finds and maps all of the 
artificial landmarks in this highly noisy domain. 


10.3.3 Feature Selection and Map 
Management 


Making EKF SLAM robust in practice often requires additional techniques for man- 
aging maps. Many of them pertain to the fact that the Gaussian noise assumption is 
unrealistic, and many spurious measurements occur in the far tail end of the noise dis- 
tribution. Such spurious measurements can cause the creation of fake landmarks in 
the map which, in turn, negatively affect the localization of the robot. 


Many state-of-the-art techniques possess mechanisms to deal with outliers in the mea- 
surement space. Such outliers are defined as spurious landmark sightings outside the 
uncertainty range of any landmark in the map. The most simple technique to compen- 
sate such outliers is to maintain a provisional landmark list. Instead of augmenting the 
map by a new landmark once a measurement indicates the existence of a new land- 
mark, such a new landmark is first added to a provisional list of landmarks. This list 
is just like the map, but landmarks on this list are not used to adjust the robot pose 
(the corresponding gradients in the measurement equations are set to zero). Once a 
landmark has consistently been observed and its uncertainty ellipse has shrunk, it is 
transitioned into the regular map. 


In practical implementations, this mechanism tends to reduce the number of landmarks 
in the map by a significant factor, while still retaining all physical landmarks with high 
probability. A further step, also commonly found in state-of-the-art implementations, 
is to maintain a posterior likelihood of the existence of a landmark. Such a probability 
may implemented as log-odds ratio and be denoted o; for the j-th landmark in the 
map. Whenever the j-th landmark is 72; observed, o; is incremented by a fixed value. 
Not observing m; when it would be in the perceptual range of the robot’s sensors leads 
to a decrement of o;. Since it can never be known with certainty whether a landmark 
is within a robot's perceptual range, the decrement may factor in the probability of 
such an event. Landmarks are removed from the map when the value o; drops below 
a threshold. Such techniques lead to much leaner maps in the face of non-Gaussian 
measurement noise. 


As noted previously, the maximum likelihood approach to data association has a clear 
limitation. This limitation arises from the fact that the maximum likelihood approach 


Simultaneous Localization and Mapping 263 


deviates from the idea of full posterior estimation in probabilistic robotic. Instead of 
maintaining a joint posterior over augmented states and data associations, it reduces 
the data association problem to a deterministic determination, which is treated as if 
the maximum likelihood association was always correct. This limitation makes EKF 
brittle with regards to landmark confusion, which in turn can lead to wrong results. In 
practice, researchers often remedy the problem by choosing one of the following two 
methods, both of which reduce the chances of confusing landmarks: 


= Spatial arrangement. The further apart landmarks are, the smaller the chance to 
accidentally confuse them. It is therefore common practice to choose landmarks 
that are sufficiently far away from each other to that the probability of confusing 
one with another is negligible. This introduces an interesting trade-off: a large 
number of landmarks increases the danger of confusing them. Too few landmarks 
makes it more difficult to localize the robot, which in turn also increases the 
chances of confusing landmarks. Little is currently known about the optimal 
density of landmarks, and researchers often use intuition when selecting specific 
landmarks. 


m Signatures. When selecting appropriate landmarks, it is essential to to maximize 
the perceptual distinctiveness of landmarks. For example, doors might possess 
different colors, or corridors might have different widths. The resulting signatures 
are essential for successful SLAM. 


With these additions, the EKF SLAM algorithm has indeed been applied successfully 
to a wide range of practical mapping problems, involving robotic vehicles in the air, 
on the ground, in the deep see, and in abandoned mines. 


A key limitation of EKF SLAM lies in the necessity to select appropriate landmarks. 
By doing so, most of the sensor data is usually discarded. Further, the quadratic update 
time of the EKF limits this algorithm to relatively scarce maps with less than 1,000 
features. In practice, one often seeks maps with 10° features or more, in which case 
the EKF ceases to be applicable. 


The relatively low dimensionality of the map tends to create a harder data association 
problem. This is easily verified: When you open your eyes and look at the full room 
you are in, you probably have no difficulty to recognize where you are! however, if 
you are only told the location of a small number of landmarks—e.g., the location of 
all light sources—the decision is much harder. As a result, data association in EKF 
SLAM is more difficult than in some of the SLAM algorithms discussed in subsequent 
chapter, and capable of handling orders of magnitude more features. This culminates 
into the principal dilemma of the EKF SLAM algorithm: While incremental maxi- 
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mum likelihood data association might work well with dense maps with hundreds of 
millions of features, it tends to be brittle with scarce maps. However, EKFs require 
sparse maps because of the quadratic update complexity. In subsequent chapters, we 
will discuss SLAM algorithms that are more efficient and can handle much large maps. 
We will also discuss more robust data association techniques. For its many limitation, 
the value of the EKF SLAM algorithm presented in this chapter is mostly historical. 


10.4 SUMMARY 


This chapter described the general SLAM problem, and introduced the EKF approach. 


= = The SLAM problem is defined as a concurrent localization and mapping problem, 
in which a robot seeks to acquire a map of the environment while simultaneously 
seeking to localize itself relative to this map. 


m The SLAM problem comes in two versions: online and global. Both problems 
involve the estimation of the map. The online problem seeks to estimate the 
momentary robot pose, whereas the global problem seeks to determine all poses. 
Both problem are of equal importance in practice, and have found equal coverage 
in the literature. 


m The EKF SLAM algorithm is arguably the earliest SLAM algorithm. It applies 
the extended Kalman filter to the online SLAM problem. With known correspon- 
dences, the resulting algorithm is incremental. Updates require time quadratic in 
the number of landmarks in the map. 


m When correspondences are unknown, the EKF SLAM algorithm applies an in- 
cremental maximum likelihood estimator to the correspondence problem. The 
resulting algorithm works well when landmarks are sufficiently distinct. 


= Additional techniques were discussed for managing maps. Two common strate- 
gies for identifying outliers include provisional list for landmarks that are not yet 
observed sufficiently often, and a landmark evidence counter that calculates the 
posterior evidence of the existence of a landmark. 


m EKF SLAM has been applied with considerable success in a number of robotic 
mapping problems. Its main drawbacks are the need for sufficiently distinct land- 
marks, and the computational complexity required for updating the filter. 


In practice, EKF SLAM has been applied with some success. When landmarks are 
sufficiently distinct, the approach approximates the posterior well. The advantage of 
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calculating a full posterior are manifold: It captures all residual uncertainty and en- 
ables the robot to reason about its control taking its true uncertainty into account. 
However, the EKF SLAM algorithm suffers from its enormous update complexity, 
and the limitation to sparse maps. This, in turn, makes the data association problem 
a difficult one, and EKF SLAM tends to work poorly in situations where landmarks 
are highly ambiguous. Further brittleness is due to the fact that the EKF SLAM algo- 
rithm relies on an incremental maximum likelihood data association technique. This 
technique makes is impossible to revise part data associations, and can induce failure 
when the ML data association is incorrect. 


The EKF SLAM algorithm applies to the online SLAM problem; it is inapplicable to 
the full SLAM problem. In the full SLAM problem, the addition of a new pose to the 
state vector at each time step would make both the state vector and the covariance grow 
without bounds. Updating the covariance would therefore require an ever-increasing 
amount of time, and the approach would quickly run out of computational time no 
matter how fast the processor. 


10.5 BIBLIOGRAPHICAL REMARKS 


Place them here! 


10.6 PROJECTS 


1. Develop an incremental algorithm for posterior estimation of poses and maps 
(with known data association) that does not rely on linearizing the motion model 
and the perceptual model. Our suggestion: Replace the Kalman filter by particle 
filters. 


2. The basic Kalman filter approach is unable to cope with the data association 
problem in a sound statistical way. Develop an algorithm (and a statistical frame- 
work) for posterior estimation with unknown data association, and characterize 
its advantages and disadvantages. We suggest to use a mixture of Gaussians rep- 
resentation. 


3. Based on the previous problem, develop an approximate method for posterior 
estimation with unknown data association, where the time needed for each incre- 
mental update step does not grow over time (assuming a fixed number of land- 
marks). 
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4. Develop a Kalman filter algorithm which uses local occupancy grid maps as its 
basic components, instead of landmarks. Among other things, problems that have 
to be solved are how to relate local grids to each other, and how to deal with the 
ever-growing number of local grids. 


5. Develop and algorithm that learns its own features for Kalman filter mapping. 
There could be two different variants: One which chooses features so as to min- 
imize the uncertainty (entropy) in the posterior, another which is given access to 
the correct map and seeks to maximize the probability of the correct map under 
the posterior. 


11 


THE EXTENDED INFORMATION 
FORM ALGORITHM 


11.1 INTRODUCTION 


The EKF SLAM algorithm described in the previous chapter is subject to a number 
of limitations, as discussed there. In this chapter, we introduce an alternative SLAM 
algorithm, called the extended information form SLAM algorithm, of EIF SLAM. EIF 
SLAM has in common with the EKF that it represents the posterior by a Gaussian. 
Unlike EKF SLAM, which solves the online SLAM problem, EIF SLAM solves the 
full SLAM problem. The reader may recall that the online SLAM posterior is defined 
over the map and the most recent robot pose, whereas the full SLAM posterior is de- 
fined over the map and the entire robot path. Another key difference is that ЕТЕ SLAM 
represents the posterior Gaussian in its information form, that is, via the information 
(or precision) matrix and the information state vector. Clearly, the information and 
the moments representation carry the same information; however, as we shall see, the 
update mechanisms of the information form are substantially different from that of the 
EKF. 


EIF SLAM is not an incremental algorithm; this is because it calculates posteriors over 
a robot path. Instead, EIF SLAM processes an entire data set at a time, to generate 
the full SLAM posterior. This approach is fundamentally different from EKF SLAM, 
which is incremental and enables a robot to update its map forever. EIF SLAM is best 
suited to problems where one seeks a map from a data set of fixed size, and can afford 
to hold the data in memory up to the time where the map is built. 


Because EIF SLAM has all data available during mapping, it can apply improved lin- 
earization and data association techniques. Whereas in EKF SLAM, the linearization 
and the correspondence for a measurement at time t are calculated based on the data 
up to time t, in EIF SLAM all data can be used to linearize and to calculate correspon- 
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dences. Further, EIF SLAM iterates the construction of the map, the calculation of 
correspondence variables, and the linearization of the measurement models and mo- 
tion models, so as to obtain the best estimate of all of those quantities. In doing so, 
EIF SLAM tends to produce maps that are superior in accuracy to that of EKFs. EIF 
SLAM is less brittle to erroneous data association, and it can cope with higher rota- 
tional uncertainties in the data than ЕКЕ SLAM. And finally, EIF SLAM is applicable 
to maps many orders of magnitude larger than those that can be accommodated by the 
EKF. 


This chapter first describe the intuition behind EIF SLAM and its basic updates steps. 
We then derive the various update steps mathematically and prove its correctness rel- 
ative to specific linear approximations. We then discuss actual implementations, in 
which posteriors are calculated over maps 


11.2 INTUITIVE DESCRIPTION 


The basic intuition behind EIF SLAM is remarkably simple. Suppose we are given 
a set of measurements 21.: with associated correspondence variables c.;, and a set 
of controls из. The first step of EIF SLAM will be to use those data to construct 
an information matrix and an information vector defined over the joint space of robot 
poses т and the map m = {m,}. We will denote the information matrix by О 
and the information vector by £. As we shall see below, each measurement and each 
control leads to a local update of Q and £. In fact, the rule for incorporating a control 
or a measurement into О and £ is a local addition, paying tribute to the important fact 
that information is an additive quantity. 


Make Figure: a robot path, landmarks, and an information matrix. 


Figure ?? illustrates the process of constructing the information matrix. Each control 
uz provides information about the relative value of the the robot pose at time t — 1 and 
the pose at time t. We can think of this information as a constraint between x,.., and 
тү, Or a "spring" in a spring-mass model of the world. The control u; is incorporated 
into О and £ by adding a values between the rows and columns connecting zr, у and 
2+. The magnitude of these values corresponds to the stiffness of the constraint—or 
the residual uncertainty between the relative poses caused by the noise in the motion 
model. This is illustrated in Figure ??, which shows the link between two robot poses, 
and the corresponding element in the information matrix. 
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Similarly, consider a measurement zł. This measurement provides information be- 
tween the location of the landmark j = c? and the robot pose 2; at time t. Again, 
this information can be thought of as a constraint. The respective update in the EIF 
amounts to adding values between the elements linking the pose x; and the map fea- 
ture mj. As before, the magnitude of these values reflects the residual uncertainty due 
to the measurement noise; the less noisy the sensor, the larger the value added to Q 
and £. 


After incorporating all measurements 24.; and controls u;.;, we obtain an information 
matrix О whose off-diagonal elements are all zero with two exceptions: Between 
апу two consecutive poses z, , and 2; will be a non-zero value that represents the 
information link introduced by the control u;. Also non-zero will be any element 
between a map feature rn; and a pose x+, if mj was observed when the robot was at 
тү. All elements between pairs of different landmarks remain zero. This reflects the 
fact that we never received information pertaining to their relative location—all we 
receive in SLAM are measurements that constrain the location of a landmark relative 
to a robot pose. Thus, the information matrix is sparse; all but a linear number of 
elements are zero. 


Of course, the information representation does not give us a map; neither does it tell 
us the estimated path of the robot. Both obtained via и = 9 1 (see Equation (3.72) 
on page 56). This operation involves the inversion of a sparse matrix. This raises 
the question on how efficiently we can recover the map, and the posterior X. It is 
the central question in EIF SLAM: EIF have been applied to maps with hundreds of 
millions of features, and inverting a matrix of this size can be a challenge. 


The answer to the complexity question depends on the topology of the world. If each 
feature is seen only once, the graph represented by the constraints in О is linear. Thus, 
Q can be reordered so that it becomes a band-diagonal matrix, that is, all non-zero 
values occur near its diagonal. The standard variable elimination technique for matrix 
inversion will then invert Q in linear time, by a single sweep through О. This intuition 
carries over to cycle-free world that is traversed once, so that each feature is seen for 
a short, consecutive period of time. 


The more common case, however, involves features that are observed multiple times, 
with large time delays in between. This might be the case because the robot goes back 
and forth through a corridor, or because the world possesses cycles, and the robot 
traverses a full cycle. In either situation, there will exist features rn; that are seen at 
drastically different time steps x+, and x;,, with t2 >> tı. In our constraint graph, 
this introduces a cyclic dependence: x+, and x+, are linked through the sequence of 
controls Ut +1, Ut;+2,---, Uta and through the joint observation links between z+, and 
Mj, and z+ and mj, respectively. Such links make our linear matrix inversion trick 
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inapplicable, and matrix inversion becomes more complex. Since most worlds possess 
cycles, this is the case of interest. 


The EIF SLAM algorithm now employs an important factorization trick, which we 
can think of as propagating information trough the information matrix (in fact, it is a 
generalization of the well-known variable elimination algorithm for matrix inversion). 
Suppose we would like to remove a feature m; from the information matrix О and 
the information state £. In our spring mass model, this is equivalent to removing the 
node and all springs attached to this node. As we shall see below, this is possible 
by a remarkably simple operation: We can remove all those springs between m; and 
the poses at which m; was observed, by introducing new springs between any pair of 
such poses. More formally, let т(7) be the set of poses at which m; was observed 
(that is: x, € r(j) <= Fi: ci = j). Then we already know that the feature rn; is 
only linked to poses z, in т(7); by construction, m; is not linked to any other pose, 
or to any landmark in the map. We can now set all links between rn; and the poses 
T(J) to zero by introducing a new link between any two poses 24,24 € 7(j). Simi- 
larly, the information vector values for all poses т(7) are also updated. An important 
characteristic of that this operation is local: It only involves only a small number of 
constraints. After removing all links to mj, we can safely remove т; from the in- 
formation matrix and vector. The resulting information matrix is smaller—it lacks an 
entry for m;. However, it is equivalent for the remaining variables, in the sense that 
the posterior defined by this information matrix is mathematically equivalent to the 
original posterior before removing mj. This equivalence is intuitive: We simply have 
replaced springs connecting mj to various poses in our spring mass model by a set 
of springs directly linking these poses. in doing so, the total force asserted by these 
spring remains equivalent, with the only exception that m; is now disconnected. 


Make Figure: reducing the graph by shifting edges around 


The virtue of this reduction step is that we can gradually transform our optimization 
problem into a smaller one. By removing each feature m, from О and ©, we ultimately 
arrive at a much smaller information form © and 3 defined only over the robot pose 
variables. The reduction can be carried out in time linear in the size of the map; in fact, 
it generalizes the variable elimination technique for matrix inversion to the information 
form, in which we also maintain an information state. The posterior over the robot path 
is now recovered as Xi = 07! and б = XE . Unfortunately, our reduction step does 
not eliminate cycles in the posterior, so the remaining matrix inversion problem may 
still require more than linear time. 


As a last step, EIF SLAM recovers the landmark locations. Conceptually, this is 
achieved by building a new information matrices О; and information vectors €; for 
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1: Algorithm EIF initialize(u,.;): 
HO0,a 0 
2: Hoy |= | 0 
10,0 0 
3: for all controls u; = (v; w)” do 
Las Ш—1, — Zt sinja-19 + 22 віш(де 1,0 + At) 
4: Шу = | amu |+ тн COS Ш—1,0 — z cos(ut—1,9 + wt A) 
Lt,0 1,0 шАї 
31 endfor 
6: return [10:4 


Table 11.1 Initialization of the mean pose vector j41:¢ in the ЕТЕ SLAM algorithm. 


each m;. Both are defined over the variable m; and the poses T(j) at which m; were 
observed. It contains the original links between m; and 7(j), but the poses (7) are set 
to the values in д, without uncertainty. From this information form, it is now very sim- 
ple to calculate the location of m;, using the common matrix inversion trick. Clearly, 
Q; contains only elements that connect to m,; hence the inversion takes time linear in 
the number of poses in T(J). 


It should be apparent why the information representation is such a natural represen- 
tation. The full SLAM problem is solved by locally adding information into a large 
matrix, for each measurement 2 and each control u+. To turn information into an 
estimate of the map and the robot path, information between poses and features is 
gradually shifted to information between pairs of poses. The resulting structure only 
constraints the robot poses, which are then calculated using matrix inversion. Once 
the poses are recovered, the feature locations are calculated one-after-another, based 
on the original feature-to-pose information. 


11.3 THE EIF SLAM ALGORITHM 


We will now make the various computational step of the ЕТЕ SLAM precise. The full 
EIF SLAM algorithm will be described in a number of steps. The main difficulty in 
implementing the simple additive information algorithm pertains to the conversion of 
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1: Algorithm ЕТЕ construct(u;.;, 21-4, C1:t; 10:1): 
2: setQh=0,€=0 
co 0 0 
3: add 0 oo 0 to Q at xo 
0 0 oo 
4: for all controls и; = (v; ux)? do 
ELS sin 41,0 + d sin(ji 1,0 + wA) 
5: ĉi = pea + e COS [411,8 — = cos(pa—1,0 + wt A) 
AC 
1 0 2 COS J4t—1,0 — ER COS(141—1,0 + QUAL) 
6 G,—| 0 1 DE sin 41,8 — a sin(14—1,0 + 4 At) 
0 0 1 
7 add 6 R7' (1 — Gi to Q at x, and zi 4 
8 add d Ret [24 + С, Ш—1] to € at x, and x4_1 
9: endfor 
10: for all measurements 2; do 
o, 0 0 
11: Qi = 0 o 0 
0 0 o; 
12: for all observed features zi = (ri фі st)” do 
13: j= сї 
14: a Ja Hj — Ша ) 
by Hjy T Шу 
15: q=8 ô 
А zi q 
16 Tp ( ate — 0 ) 
17: Ні 1 уд: уду 0 —\/4б„ уу ) 
t q 6 Ôx —1 —dy 02 
18: add HiT Qr! Hj toQ at z, and m; 
Ht,x 
Pt,y 
19: add Hi? Qr! [zi - zi - Hi | шө |] to€ ata, and m; 
Hua 
Hj 
20: endfor 
21: endfor 
22: return Q, € 


Table 11.2 Calculation of Q and € in the EIF algorithm. 
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Algorithm ЕТЕ reduce((?, ©): 
0-290 
£-t 
for each feature j do 
let т(ў) be the set of all poses x, at which feature j was observed 
subtract 0,0 OFF $ from € at 1.03) and m; 
subtract 9.03), з Qu Отуу from О at z,(; _ 
remove the rows and columns corresponding to feature j from © апа £ 
endfor 
return Q, € 


) and mj 


ооа оао Г-ге 


Ф 


Table 11.3 Algorithm for reducing the size of the information representation of the pos- 
terior. 


a conditional probability of the form p(z2 | x+, m) and p(x, | ш, 2,1) into a link 
in the information matrix. The information matrix elements are all linear; hence this 
step involves linearizing р(2 | x+, m) and р(х, | ш, 2.1). In EKF SLAM, this 
linearization was found by calculating a Jacobian at the estimated mean poses ио. То 
build our initial information matrix Q and &, we need an initial estimate ио. for all 
poses zo. 


There exist a number of solutions to the problem of finding an initial mean ш suitable 
for linearization. For example, we can run an EKF SLAM and use its estimate for 
linearization. For the sake of this chapter, we will use an even simpler technique: Our 
initial estimate will simply be provided by chaining together the motion model p(x; | 
Uz, 24.1). Such an algorithm is outlined in Table 11.1, and called there EIF initialize. 
This algorithm takes the controls u;.; as input, and outputs sequence of pose estimates 
рож. It initializes the first pose by zero, and then calculates subsequent poses by 
recursively applying the velocity motion model. Since we are only interested in the 
mean poses vector џо::, EIF initialize only use the deterministic part of the motion 
model. It also does not consider any measurement in its estimation. 


Once an initial ио. is available, the EIF SLAM algorithm constructs the full SLAM 
information matrix Q and the corresponding information vector &. This is achieved by 
the algorithm ЕТЕ construct, depicted in Table 11.2. This algorithm contains a good 
amount of mathematical notation, much of which shall become clear in our mathemat- 
ical derivation of the algorithm further below. EIF construct accepts as an input the 
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Algorithm ЕТЕ solve(C), ra ‚ 2,8): 
Уо = 0-1 
Ho:t = Уо: € 


for each feature j do 
let t(j) be the set of all poses x, at which feature j was observed 
ex 55 
pj = Оуу (6 + 9570) he) 
endfor 
return u, Xo: 


E Md 


Table 11.4 Algorithm for updating the posterior p. 


set of controls, из. the measurements 21.; and associated correspondence variables 
сд, and the mean pose estimates ио... It then gradually constructs the information 
matrix €) and the information vector £ by locally adding submatrices in accordance 
with the information obtained from each measurement and each control. In particular, 
Line 2 in EIF construct initializes the information elements. The "infinite" infor- 
mation entry in Line З fixes the initial pose zo to (0 0 0)7. It is necessary, since 
otherwise the resulting matrix becomes singular, reflecting the fact that from relative 
information alone we cannot recover absolute estimates. 


Controls are integrated in Lines 4 through 9 of EIF construct. The pose ĉ and the 
Jacobian С; calculated in Lines 5 and 6 represent the linear approximation of the non- 
linear measurement function g. As obvious from these equations, this linearization 
step utilizes the pose estimates 4o:+—1, with ui = (0 0 0)”. This leads to the updates 
for О, and £, calculated in Lines 7, and 8, respectively. Both terms are added into the 
corresponding rows and columns of Q and £. This addition realizes the inclusion of 
a new constraint into the SLAM posterior, very much along the lines of the intuitive 
description in the previous section. 


Measurements are integrated in Lines 10 through 21 of EIF construct. The matrix ©, 
calculated in Line 11 is the familiar measurement noise covariance. Lines 13 through 
17 compute the Taylor expansion of the measurement function, here stated for the 
feature-based measurement model defined in Chapter 6.6. This calculation culminates 
into the computation of the measurement update in Lines 18 and 19. The matrix that is 
being address to €? in Line 18 is of dimension 5 x 5. To add it, we decomposes it into a 
matrix of dimension 3 x 3 for the pose x;, a matrix of dimension 2 x 2 for the feature 
mj, and two matrices of dimension З x 2 and 2 х 3 for the link between x; and mj. 
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Those are added to €? at the corresponding rows and columns. Similarly, the vector 
added to the information vector £ is of vertical dimension 5. It is also chopped into 
two vectors of size 3 and 2, and added to the elements corresponding to т, and mj, 
respectively. The result of ЕТЕ construct is an information vector € and a matrix О. 
Q is sparse, containing only non-zero submatrices along the main diagonal, between 
subsequent poses, and between poses and features in the map. If the number map 
features outnumber the number of poses by a factor of 10, some 99% of the elements 
of Q are zero. The running time of this algorithm is linear in t, the number of time 
steps at which data was accrued. 


The next step of the EIF SLAM algorithm pertains to reducing the dimensionality of 
the information matrix/vector. This is achieved through the algorithm EIF reduce in 
Table 11.3. This algorithm takes as input О and £ defined over the full space of map 
features and poses, and outputs a reduced matrix О and vectors € defined over the 
space of all poses. This transformation is achieved by removing features m, one-at- 
a-time, in Lines 4 through 9 of EIF reduce. The book keeping of the exact indexes 
of each item in €) and ё is а bit tedious, hence Table 11.3 only provides an intuitive 
account. Line 5 calculates the set of poses 7(j) at which the robot observed feature 
j. It then extracts two submatrices from the present О: О, j » Which is the quadratic 
submatrix between m; and mj, and Q 7(j),j» Which is composed of ће off-diagonal 
elements between m; and the pose variables | j). It also extracts from the information 
state vector ё the elements corresponding to the j-th feature, denoted here as €;. It then 
subtracts information from © and ё as stated in Lines 6 and 7. After this operation, 
the rows and columns for the feature m; nj аге zero. These rows and columns are then 
removed, reducing the dimension on О and ra accordingly. This process is iterated 
until all features have been removed, and only pose variables remain in Q and ra . The 
complexity of EIF reduce is once again linear in t. 


The last step in the EIF SLAM algorithm computes the mean and covariance for all 
poses in the robot path, and a mean location estimate for all features in the map. This 
is achieved through ЕТЕ solve in Table 11.4. Lines 2 and 3 compute the path estimates 
Ho:t, by inverting thee reduced information matrix О and multiplying the resulting co- 
variance with the information vector. When © involves cycles, the inversion becomes 
the computationally most expensive step in the EIF SLAM algorithm. Subsequently, 
EIF solve computes the location of each feature in Lines 4 through 7. The return 
value of EIF solve contains the mean for the robot path and all features in the map, 
but only the covariance for the robot path. 


The quality of the solution calculated by the EIF SLAM algorithm depends on the 
goodness of the initial mean estimates, calculated by EIF initialize. The x- and y- 
components of these estimates effect the respective models in a linear way, hence the 
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Algorithm ЕТЕ SLAM known correspondence(u;.;, 21:1; C1:¢)! 
шоч = EIF initialize(u;.;) 
repeat 
Q, € = ЕТЕ construct(u;4, 21:2, Ct 0:0) 
Q, E = EIF_reduce(, £) 
u, Xig4 = EIF.solve(Q, £, О, €) 
until convergence 
return и 


oom om Ue Pe qo ra 


Table 11.5 The EIF SLAM algorithm for the full SLAM problem with known correspon- 
dence. 


linearization does not depend on they value. Not so for the orientation variables in 
ро:+. Errors in these initial estimates affect the accuracy of the Taylor approximation, 
which in turn affects the result. 


To accommodate errors in the initial ршо.:, the procedures ЕТЕ construct, ЕТЕ reduce, 
and EIF solve are run multiple times over the same data set. Each iteration takes 
as an input a estimated mean vector ио: for all poses, and outputs a new, improves 
estimate. The iteration of the EIF SLAM optimization are only necessary when the 
initial pose estimates have high error (e.g., more than 20 degrees orientation error). A 
small number of iterations (e.g., 3) is usually sufficient. 


Table 11.5 summarizes the resulting algorithm. It initializes the means, then repeats 
the construction step, the reduction step, and the solution step. Typically, two or three 
iterations suffice for convergence. The resulting mean и comprises the best estimates 
of the robot's path and the map. 


11.4 MATHEMATICAL DERIVATION 


The derivation of the EIF SLAM algorithm begins with a derivation of a recursive 
formula for calculating the full SLAM posterior, represented in information form. We 
then investigate each term in this posterior, and derive from them the additive SLAM 
updates through Taylor expansions. From that, we will derive the necessary equations 
for recovering the path and the map. 
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11.4.1 The Full SLAM Posterior 


As in the discussion of EKF SLAM, it will be beneficial to introduce a variable for the 
augmented state of the full SLAM problem. We will use y to denote state variables 
that combine one or more poses x with the map m. In particular, we define yo. to 
be a vector composed of the path о. and the map m, whereas y; is composed of the 
momentary pose at time ¢ and the map m: 


To 
21 
u ‘ _ Tt 
Yot = к апа Y = [2 (11.1) 
Tt 
m 


The posterior in the full SLAM problem is p(yo:t | 21:4, Чы: C1:t), where 214 are the 
familiar measurements with correspondences сі.:, and u;.; are the controls. Bayes 
rule enables us to factor this posterior: 


P(Yo:t | zu, U1:t, C1:t) 


= n p(z | Vo:t; 21:21, U1:t; C1:t) P(Yo:t | Pici teer) (11.2) 


where 7) is the familiar normalizer. The first probability on the right-hand side can be 
reduced by dropping irrelevant conditioning variables: 


plz | 9o:t; 21:21, Чы; C1:t) = plz | Yt, Ct) (11.3) 


Similarly, we can factor the second probability by partitioning уо. into z, and yo.4_1, 
and obtain 


P(Yo:t | 21а—1› Ча: C1:t) 
p(zi | Yii 15 #1:#—1› Чы) Cas) p(yis-a | 214—1, Ups Сы) 


= plz | 24 1,4) p(yis 1 | Z1:t—15 U1:t—1; C1:t 1) (11.4) 


Putting these expressions back into (11.2) gives us the recursive definition of the full 
SLAM posterior: 


P(Yo:t | 21) Чы: Сы) 
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= n p(z | Yt, Ct) р(х. | zia, Ut) pia 1 | 21:11, U1:t—1; C1:t 1) (11.5) 


The closed form expression is obtained through induction over t. Here p(yo) is the 
prior over the map m and the initial pose xo. 


P(Yo:t | £t Чы» Сы) = n p(yo) П | Yt, Ct) p(zi | Tt—1, Ut) (11.6) 
t 


= 1 P(yo) П p(zi | z«—1, Ue) П | Ye, c4) 


t i 


Here, as before, zi is the ¿i-th measurement in the measurement vector 2; at time t. 
The prior p(yo) factors into two independent priors, p(xo) and p(m). In SLAM, we 
have no initial knowledge about the map m. This allows us to replace p(yo) by p(xo), 
subsuming the factor p(m) into the normalizer r. 


The information from represents probabilities in logarithmic form. The log-SLAM 
posterior follows directly from the previous equation: 


log p(yo: | 21:2, 01:6, C1:1) (11.7) 


= const. + log p(zo) m3 log p(z | i1, Ut) ) So log p( (zi | yt, cl) 
t 


i 


This posterior has a simple form: It is a sum of terms. These terms include a prior, 
one term for each control u;, and one term for each measurement z;. As we shall see, 
EIF SLAM simply replaces those additive terms with quadratic constraints. 


11.4.2 Taylor Expansion 


The key approximation of EIF SLAM is the same as in the EKF: The motion and 
measurement models are approximated using linear functions with Gaussian error dis- 
tributions. Just as in Chapter 10, we assume the outcome of robot motion is distributed 
normally according to N (g(ut, 2,1), R+), where g is the deterministic motion func- 
tion, and R, is the covariance of the motion error. Similarly, measurements 2 are 
generated according to N (h(y;, сї), Q+), where h is the familiar measurement func- 
tion and Q; is the measurement error covariance. In equations, we have 


р(2, | 20-1,—) = N exp {—5 (2 E g(ui, 24—1))7 m (ж, — д(ш,а+—1))} 
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plizi | ус) = n exp {—5(2} —h(ye,ct))” Qr! (24 — h(n, с))} (11.8) 


The prior p(o) in (11.7) is also easily expressed by a Gaussian-type distribution. 
This prior anchors the initial pose х0 to the origin of the global coordinate system: 
zo — (0 0 0)7: 


р(х) = mexpí-izsj Qo zo} (11.9) 


with 
oo 0 0 
Qo = 0 oo 0 (11.10) 
0 0 oo 


It shall, at this point, not concern us that the value of co cannot be implemented, as 
we can easily substitute oo with a large positive number. This leads to the following 
quadratic form of the log-SLAM posterior in (11.7): 


log p(yo:t | 21:2, Чы; C14) 


= const. — i 21 Qo £o + SG: — glut, 24—1))7 В! (ж, — glut, z1—1)) 
t 


+Y - hlyn d)" Qr (i — hy. d) (11.11) 


This representation highlights an essential characteristic of the full SLAM posterior 
in the information form: It is composed of a number of quadratic terms, one for the 
prior, and one for each control and each measurement. 


However, these terms are quadratic in the functions g and h, not in the variables we 
week to estimate (poses and the map). This is alleviated by linearizing g and h via 
Taylor expansion, analogously to Equations (10.15) and (10.19) in the derivation of 
the EKF: 


glut, 20-1) © glut, pai) + Gi(ici — paa) 
һу, ct) h(t, с) + Ay (ye — ш) (11.12) 


Q 
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Where и is the current estimate of the state vector y+, and Н} = = hi F, z,j as defined 
already in Equation (10.20). 


11.4.3 Constructing the Information Form 


This linear approximation turns the log-likelihood (11.11) into a function that is 
quadratic in yo:+. In particular, we obtain 


log p(Yo:t | 21:2, Чы, C14). = const. — i 


lr Qo zo + У — 9(ш, 1) — Gi (ti — m1)? 


t 


В} [xe — glut, pii) — Ga(z«-i — He—1)] (11.13) 


+218 — Мше, с) — Ні — ш)] Өг, [ei — А, ci) — Ну — m} 


This function is indeed a quadratic, but it shall prove convenient to reorder its terms. 


log p(yo:t | zit; 1:014) = const. 


1 = 
-3 ж Qo 20 551. ( =@, ) RS — Gi) 2214 


quadratic in xo 
quadratic in zs: 


1 Š 
TELA ( ag. ) Ry [g(ue, pai) + Ge pai] (11.14) 
—$—$— ——— 


linear in 241: 


1 T iT 0-1 i T iT a oe | 7 i 7 
= у y Не © Ну + yt Ay О, [a = hlut ci) — Неш) 
— cM 
E quadratic in y; linear in yt 


Here z,..,.; denotes the vector concatenating 2,1 and x+; hence we can write (a4 — 
T VT 
Gi 2-1)" = Tiie (1 — G+). 


If we collect all quadratic terms into the matrix Q, and all linear terms into a vector £, 
we see that expression (11.14) is of the form 


log p(yo:t | 21:2, U1:t;C14) = const. — 5 1 Yot Q Yot + Yo. Яз (11.15) 
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We can read off these terms directly from (11.14), and verify that they are indeed 
implemented in the algorithm EIF construct in Table 11.2: 


m Prior. The initial pose prior manifests itself by a quadratic term Оо over the initial 
pose variable zo in the information matrix. Assuming appropriate extension of 
the matrix No to match the dimension of yo.;, we have 


Oe On (11.16) 


This initialization is performed in Lines 2 and 3 of the algorithm EIF construct. 
EIF construct. 


= Controls. From (11.14), we see that each control и, adds to Q and € the fol- 
lowing terms, assuming that the matrices are rearranged so as to be of matching 


dimensions: 
Q — a«( A кой — Gi) (11.17) 
б; 
1 СЕ 
а es ( zu ) Rz [g(us, jii) + Ge iii] (11.18) 


This is realized in Lines 4 through 9 in EIF construct. 


= Measurements. According to Equation (11.14), each measurement z/ transforms 
О and © by adding the following terms, once again assuming appropriate adjust- 
ment of the matrix dimensions: 


Q — бо+н" орі H} (11.19) 
€ — £€- HL Qi [8 hlu сі) — Ніші] (11.20) 


This update occurs in Lines 10 through 21 in EIF construct. 


This proves the correctness of the construction algorithm EIF construct, relative to 
our linear Taylor expansion approximation. 


We also note that the steps above only affect elements on the (generalized) main diag- 
onal of the matrix, or elements that involve at least one pose. Thus, all between-feature 
elements are zero in the resulting information matrix. 
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Marginals of a multivariate Gaussian. Let the probability distribution p(x, y) over the ran- 
dom vectors x and y be a Gaussian represented in the information form 


Q = & 1 mde . ee t3 (11.21) 


If Оуу is invertible, the marginal p(x) is a Gaussian whose information representation is 


Proof. The marginal for a Gaussian in its moments representation 


Lye Ууу Hy 
is V (Ux, Ух). By defi nition, the information matrix of this Gaussian is therefore У, and the informa- 
tion vector is X их. We show xd = Охх via the Inversion Lemma from Table 3.2 on page 44. Let 


P = (0 1)7, and let [оо] be a matrix of the same size as Оу but whose entries are all infi nite (and with 
[оо] = 0. This gives us 


2: do. Оу Tee des wg 
T(4—1 - cx ry m, cc 
л op ао) 
The same expression can also be expanded by the inversion lemma into: 
(2 + Ploo]P*)~* 
= Q-Q Plo]! +PT 9 P) PTA 
= Q-QOP(o-PTQp)!prTQo 
Q—Q Р(0,у) t PTQ 


Ош Ош Oy; Оу 0 Ол; Әуе Оуу 
@) ( О Ошу )-( 0 Qaey 951 ) ( О Ошу ) 
О Nyy 0 1 От Nyy 
E ( Os. Ж ры )-( Озу Ay Орь Ошу ) _ ( Оза 0 ) 
О Nyy Qua О 0 0 


The remaining statement, xl их = Ex, is obtained analogously, exploiting the fact that и = 0-1 (see 
Equation(3.72)) and the equality of the two expressions marked “(ж)” above: 


Cue ES. eque) 
) 


(9 EXE um -1 f Es 
= [о E i Q| Q 


Ue em 


Table 11.6 Lemma for marginalizing Gaussians in information form. 
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Conditionals of a multivariate Gaussian. Let the probability distribution p(x, y) 
over the random vectors x and y be a Gaussian represented in the information form 


О = Ga 1 апа SS Eo (11.23) 


The conditional p(x | y) is a Gaussian with information matrix Ол and information 
vector £; + (uy y. 


Proof. The result follows trivially from the definition of a Gaussian in information 
form: 


p(x | y) 


т т 
Q Q x x É 
ТРЕ | 2 у О Ош y y by 
= 1 exp {- 42T Олд + ТО уу — 5 + ШҮ + zl£, + yT ey} 
= 1 ехр{– 5270,2 + zT (Оуу + fe) 73 + y Оу-у Ey} 
———— 
const. 


Em exp(- iz T Qiu + (Оуу + &,)} (11.24) 


Table 11.7 Lemma for marginalizing Gaussians in information form. 
11.4.4 Reducing the Information Form 
The reduction step EIF reduce is based on a factorization of the full SLAM posterior. 


P(Yo:t | zu, U1:t, C1:t) 


= P(Lo:t | Zi, Чы, Сыз) p(m | 20:6, 21:6, Чы Cut) (11.25) 


Here p(zo; | 21:4, u1:5 Crt) ~ N (Q, ©) is the posterior over paths alone, with the map 
integrated out: 


p(zo: | Zi, Чы, Ct) = n | 21:6, Чы, Сыз) dm (11.26) 


As we will show shortly, this probability is caculated in EIF reduce in Table 11.3: 


р(хо | zu Ure cua) ~ NEO) (11.27) 
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In general, such an integration will be intractable, due to the large number of vari- 
ables in m. For Gaussians, this integral can be calculates in closed form. The key 
insight is given in Table 11.6, which states and proves the Marginalization Lemma for 
Gaussians. 


Let us subdivide the matrix О and the vector £ into submatrices, for the robot path xo-+ 
and the map m: 


= Оло. тоз E 
Q = ( Ou aos Om ) (11.28) 
a 7" 


The probability (11.27) is then obtained as 


tot 
| 


Оло. лог = оет ros хог (11.30) 
ОО En (11.31) 


The matrix О is block-diagonal. This follows from the way €? is constructed, 
in particular the absence of any links between pairs of landmarks. This makes the 
inversion efficient: 


—1 = T —1 
Oum = 2,F ОЁ, (11.32) 
2 


where Оу = КОЕТ is the sub-matrix of О that corresponds to the j-th feature in 
the map, that is 


0.0 100 0---0 
0.0 010 0.0 
Ee e si (11.33) 


м2 
j—th feature 


This makes it possible to to decompose the implement Equations (11.30) and (11.31) 
into a sequential update: 


О mex a 5 Qt hom. (11.34) 
j 
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E = Eron = P Raoni 055 5 (11.35) 
j 


The matrix Nro., j is non-zero only for elements in 7(j), the set of poses at which 
landmark j was observed. This essentially proves the correctness of the reduction 
algorithm EIF reduce in Table 11.3. The operation performed on €? in this algorithm 
implments an incomplete variable elimination algorithm for matrix inversion, applied 
to the landmark variables. 


11.4.5 Recovering the Path and the Map 


The algorithm EIF.solve in Table 11.4 calculates the mean and variance of the 
Gaussian V (£, Q), using the standard equations, see Equations (3.67) and (3.72) on 
page 55: 


(11.36) 
ra (11.37) 


Бє M 

| | 

Me OD 
| 


In particular, this operation provides us with the mean of the posterior on the robot 
path; it does not give us the locations of the features in the map. 


It remains to recover the second factor of Equation (11.25): 
p(m | zo: 21:6, uit C1:t) (11.38) 


The conditioning lemma, stated and proved in Table 11.7, shows that this probability 
distribution is Gaussian with the parameters 


Ym = 95! (11.39) 


m,m 


um = Emlêm + Nm zo) (11.40) 


Here ém ара m,m are the subvector of £, and the submatrix of О, respectively, re- 
stricted to the map variables. The matrix Qm „,., is the off-diagonal submatrix of О 
that connects the robot path to the map. As noted before, О,» is block-diagonal, 
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hence we can decompose 


p(m | 29:15 21:2, Чы» Сы) E Ll»; | 00:2, 21:5 Чы, Сы) (11.41) 
1 


where each p(m; | 19:4, 21:4, 1:4, Сы) is distributed according to 


эу == О (11.42) 
Hj Dl + От) = Mj cg) (11.43) 


he last transformation exploited the fact that the submatrix 0); xo., is zero except for 
those pose variables 7(7) from which the j-th feature was observed. 


It is important to notice that this is a Gaussian p(m | £o:t, 21:4, 01:4, С) conditioned 
on the true path zo.;. In practice, we do not know the path, hence one might want to 
know the posterior p(m | 21-2, U1:t, C1-4) without the path in the conditioning set. This 
Gaussian cannot be factored in the moments representation, as locations of different 
features are correlated through the uncertainty over the robot pose. For this reason, 
ЕТЕ solve returns the mean estimate of the posterior but only the covariance over the 
robot path. Luckily, we never need the full Gaussian in moments form—which would 
involve a fully populated covariance matrix of massive dimensions— as all essential 
questions pertaining to the SLAM problem can be answered at least in approximation 
without knowledge of У. 


11.5 DATA ASSOCIATION IN THE EIF 


Data association in EIFs is realized through correspdoncence variables, just as in ЕКЕ 
SLAM. Just like EKF SLAM, the EIF searches for a single best correspondence vector, 
instead of calculating an entire distribution over correspondences. Thus, finding a cor- 
respondence vector is a search problem. However, correspondences in EIF SLAM are 
defined slightly differentely: they are defined over pairs of features in the map, rather 
than associations of measurements to features. Specifrically, we say c(j, k) = 1 iff mj 
and mij, correspond to the same physical feature in the world. Otherwise, c(j, k) = 0. 
This feature-correspondence is in fact logically equivalent to the correspondence de- 
fined in the previous section, but it simplifies the statement of the basic algorithm. 


Our technique for searching the space of correspondences is greedy, just as in the EKF. 
Each step in the search of the best correspondence value leads to an improvement, as 
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measured by the appropriate log-likelihood function. However, because EIFs consider 
all data at the same time, it is possible to devise correspondence techniques that are 
considerably more powerful than the incremental approach in the ЕКЕ. In particular: 


1. At any point in the search, EIFs can consider the correspondence of any set of 
landmarks. There is no requirement to process the observed landmarks sequen- 
tially. 


2. Correspondence search can be interleaved with the calculation of the map. As- 
suming that two observed landmarks correspond to the same physical landmark 
in the world affects the resulting map. By incorporating such a correspondence 
hypothesis into the map, other correspondence hypotheses will subsequently look 
more or less likely. 


3. Data association decisions in EIFs can be undone. The goodness of a data as- 
sociation depends on the value of other data association decisions. What once 
appears to be a good choice may, at some later time in the search 


In the remainder of this chapter, we will describe one specific correspondence search 
algorithms that exploits the first two properties (but not the third). Our data associaiton 
algorithm will still be greedy, and it will sequentially seaerch the space of possible 
correspondences to arrive at a plausibale map. However, like all greedy algorithms, 
our approach is subject to local maxima; the true space of correspondences is of course 
exponential in the number of features in the map, and searching through all of them 
is infeasible in most environments. Hence, we will be content with a hill climbing 
algorithm. 


11.5.1 The EIF SLAM Algorithm With 
Unknown Correspondence 


The key component of our algorithm is a likelihood test for correspondence. Speci- 
fially, EIF correspondence is based on a simple test: What is the probability that two 
different features in the map, т; апа Mmg, correspond to the same physical feature in 
the world? If this probability exceeds a threshold, we will accept this hypothesis and 
merge both features in the map. 


The correspond test is depicted in Table 11.8: The input to the test are two feature 
indexes, 7 and k, for which we seek to compute the probability that those two feagtures 
corespond to the same feature in the physical world. To calculate this probability, 
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E Algorithm EIF correspondence test(C, £, и, 39.4, j, k): 
2: Оры = Bjk jk — Әт) Ec) тв) eG) jk 
3: Su] = 9p Mk 
T 
1 1 
4: Олук = 4 Ok] _1 
Т 
1 
5: Sank = | Q | ём 
6: Шду,к = ОА EAj,k 
" - 
Т return |27 Qa el 2 exp {-3 UA ik Олук pase} 


Table 11.8 The EIF SLAM test for correspondence: It accepts as input an information 
representation of the SLAM posterior, along with the result of the EIF solve step. It then 
outputs the posterior probability that m; corresponds to mg. 


our algorithm utilizes a number of quantities: The information representation of the 
SLAM posterior, as manifest by €? and £, and the result of the procedure EIF solve, 
which is the mean vector и and the path covariance У 0.;. 


The correspondence test then proceeds in the following way: First, it computes the 
marginalized posterir over the two target features. This posterior is represented by the 
information matrix ©; к] and vector €/;,,; computed in Lines 2 and 3 in Table 11.8. 
This step of the computation utilizes various sub-elements of the information form 
Q, £, the mean feature locations as specifcied through и, and the path covariance 29.4. 
Next, it calculates the parameters of a new Gaussian random variable, whose value is 
the difference between m; and mg. Denoting the difference variable А; = m; — 
m, the informaiton parameters Од ; к, &лу,к are calculated in Lines 4 and 5, and the 
corresponding expectation for the difference is computed in Line 6. Line 7 return the 
desired probability: the probability that the difference between m; and m; is 0. 


The correspondence test provides us now with an algorithm for performing data as- 
sociation search in EIF SLAM. Table 11.9 shows such an algorithm. It initializes the 
correspondence variables with unique values. The four steps that follow (Lines 3-7) 
are the same as in our ЕТЕ SLAM algorithm with known correspondence, stated in 
Table 11.5. However, this general SLAM algorithm then engages in the data associ- 
ation search. Specifically, for each pair of different features in the map, it calculates 


The Extended Information Form Algorithm 289 


1 Algorithm EIF SLAM(u;., 21:4: 

2 initialze all сі with a unique value 

3 Lo: = ЕТЕ initialize(u,.;) 

4 Q, E = EIF construct(u;., 21:4, C1; H0:t) 

5: Q,£— ЕТЕ reduce(Q, ©) 

6: u, Уол = EIF.solve(Q, £, Q, €) 

7 repeat 

8: for each pair of non-corresponding features mj, т». do 
9: calculate т; = ЕТЕ correspondence test(Q, £, p, 304, j, k) 
10: if тук > x then 

11: for all c; = k set ġ = j 

12: Q, E = EIF.construct(u;., 214, C1:t; 0:1) 
13: Q,£— ЕТЕ reduce(Q, ©) 

14: р, Xo = EIF во1уе(О,ё, Q, €) 

15: endif 

16: endfor 

17: until no more pair mj, mj, found with ту=к < x 

18: return и 


Table 11.9 The EIF SLAM algorithm for the full SLAM problem with unknown corre- 
spondence. The inner loop of this version can be made more effi cient by selective probing 
feature pairs mj, Mg, and by collecting multiple correspondences before solving the re- 
sulting collapsed set of equations. 


the probability of correspondence (Line 9 in Table 11.9). If this probability exceeds 
a threshold x, the correspondence vectors are set to the same value (Line 11). The 
EIF SLAM algorithm them interates the construction, reduction, and solution of the 
SLAM posterior (Lines 12 through 14). As a result, subsequent correspondence tests 
factor in previous correspondence decisionsm though a newly constructed map. The 
map construction is terminated when no further features are found in its inner loop. 


Clearly, the alorithm EIF_SLAM is not particularly efficient. In particular, it tests all 
feature pairs for correspondence, not just nearby ones. Further, it reconstructs the map 
whenevr a single correspondence is found; rather than processing sets of correspond- 
ing features in batch. Such modifications, however, are relatively straiughtforward. 
A good implementation of EIF SLAM will clearly be more refined than our basic 
implementation discussed here. 
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11.5.2 Mathematical Derivation 


We essentially restrict our derivatoion to showing the correctness of the correspon- 
dence test in Table ??. Our first goal shall be to define a posterior probability distribu- 
tion over a variable A; = m; — mx, the difference between the location of feature 
ту and feature ттк. Two features m; and ттк are equivalent iff their location is the 
same. Hence, by calculating the posteiror probability of А; к =, we obtain the desired 
correspondence probability. 


We obtain the posterior for А; ;, by first calculating the joint over m; and тк: 


p(mj, Mk | Zi, Чы, быз) 


= КО | Жү, he Cus) p(zis | Pie, Ча» С) dais (11.44) 


We will denote the information form of this marginal posterior by &; ку and Орук. 
Note the use of the squared brackets, which distinguish these values from the subma- 
trices of the joint information form. 


The distribution (11.44)is obtained from the joint posterior over yo.;, by applying the 
marginalization lemma. Specifically, here О and £ represent the joint posterior over 
the full state vector yo.;, and 7(j) and 7(k) denote the sets of poses at which the robot 
observed feature j, and feature k, respectively. Further, from our EIF solution we are 
given the mean pose vector р. To apply the marginalization lemma (Table 11.6), we 
shall leverage the result of the algorithm EIF solve. Specifically, EIF solve provides 
us already with a mean for the features m; and ттк. We simply restate the computation 
here for the joint feature pair: 


Bg = О (бук Osee) (11.45) 


Here 7(j, k) = r(j) U r(k) denotes the set of poses at which the robot observed m; 
ОГ mMk. 


For the joint posterior, we also need a covariance. This covariance is not computed in 
EIF solve, simply because the joint covariance over multiple features requires space 
quadratic in the number of features. However, for pairs of feautures the covariance 
of the joint is easily recovered. Specicially, let >, (; ку,т(у„ку be the submatrix of the 
covariance Xo. restricted to all poses in 7(j, k). Here the covariance >. is calculated 
in Line 2 of the algorithm EIF solve. Then the marginalization lemma provides us 
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with the marginal information matrix for the posterior over (m; mp)”: 
Qua] = Әк — Әт) Mea) Gao) Orga) (11.46) 


The information form representation for the desired posterior is now completed by the 
following information vector: 


бры = Әр Mg (11.47) 
Hence we have for the joint 


pmj, Mk | Z1:t; Чи C1:t) 


T T 
c d E ( ina ) Qua] ( "ie )+ ( me ) 27 


(11.48) 


These equations are identical to Lines 2 and 3 in Table 11.8. 


The nice thing about our representation is that it immediately lets us define the desired 
correspondence probability. For that, let us consider the random variable 


Ajk = mj — my (11.49) 


E E d 1 ) (11.50) 
Mk —1 ` 


Plugging this into the definition of a Gaussian in informtation representation, we ob- 
tain: 


р(А» к | 21a, Urt, Crt) 


1 ү 1 j^ 
= mnexp4-$ Ак ( _1 ) Әр, ( zy ) Ау AT. ( 21 ) SU 
SS SS 6 


=: QAj,k =: €Aj,k 
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T 
= mexp(-i AF, Олук + AT, Ease} (11.51) 


which is Gausian with information matrix Олук and information vector £A; as 
defined above. To calculate the probability that this Gaussian assumes the value of 
А; в = 0, it shall be useful to rewrite this Gaussian in moments form: 


P(Aja | 21:2, 01:2, Crt) (11.52) 


1 = 
= рт 05 „|72 exp {-3 (Aj — age)” Ол} (Age — ил) ) 
where the mean is given by the obvious expression: 


Hajk = ОО есу (11.53) 


These steps are found in Lines 4 through 6 in Table 11.8. 


The desired probability for A j к = 0 is the result of plugging 0 into this distribution, 
and reading off the resulting probability: 


1 ы 
plAj e =0 | 21t, Urt cre) = |2т Од}„| 2 exp {-3 HAs k Олук Hast} 
(11.54) 


This expression is the probability that two features in the map, 72; and тк, corre- 
spond to the same features in the map. This calculation is implemented in Line 7 in 
Table 11.8. 


11.6 EFFICIENCY CONSIDERATION 


Practical implementations of EIF SLAM rely on a number of additional insights and 
techniques for improving efficiency. Possibly the biggest deficiency of ЕТЕ SLAM, 
as discussed thus far, is due to the fact that in the very beginnning, we assume that 
all observed features constitute different landmarks. Our algorithm them unifies them 
one-by-one. For any reasonable number of features, such an approach will be unbear- 
ably slow. Further, it will neglect the imprtant constraint that at any point in time, the 
same feature can only be observed once, but not twice. 
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Existing implementations of the ЕТЕ SLAM idea exploit such opportunities. Spoecifi- 
cally, 


1. Features that are immediately identified to correspond with high likelihood are 
often unified form the very beginning, before running the full EIF SLAM solu- 
tion. For exmaple, it is quite common to compule short segments into local maps, 
e.g., local occupancy grid maps. EIF inference is then performed onlhy between 
those local occupancy grid maps, where the match of two maps is taken as a prob- 
abilistic constraint between the relative poses of these maps. Such a hierarchical 
technique reduces the complexity of SLAM by orders of magnitude, while still 
retaining some of the key elements of the EIF solution, specifically the ability to 
perform data association over large data sets. 


2. Many robots are equipped with sensors that observe large number of features at 
a time. For exampl, laser range finders observe dozens of features. For any such 
scan, one commonly assumes that different measurements indeed correspond to 
different features in the environment, by virtue of the fact that each scan points 
in a different direction. It therefore follows that i A j — ci A ci , that is, at no 
point in time correspond to different measurements to the same feature. 


Our pairwise data association technique above is unable to incorporate this con- 
straint. Specifically, it may assign two measurements z? and 27 to the same fea- 
ture z* for some s Æ t. To overcome this problem, it is common to assiciate 
entire measurement vectors z; and z, at the same time. This involves a calcula- 
tion of a joint over all features in z; and zs. Such a calculation generalizes our 
pairwaise calculation and is mathematically straightforward. 


3. The EIF algorithm stated in this chapter does not make use of its ability to undo 
data asociations. Once a data association decision is made, it cannot be reverted 
further down in the search. mathematically, it is relatively straightforward to undo 
past data association decisions in the information framework. in particular, one 
can change the correspondence variables of any two measurements in arbitrary 
ways in our algothm above. However, it is more difficult to test whether a data 
associaiton should be undone, as there is no (obvious) test for testing wether 
two previously associated features should be distinct. A simple implementation 
involves undoing a data association in question, rebyuilding the map, and testing 
whether our criterion above still calls for correspondence. Such an approach 
can be computationally involved, as it provides no means of detecting which data 
association to test. Mechnicsms for detecting unlikely associations are outside the 
scope of this book, but should be considered when implementing this approach. 
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Figure 11.1 The Groundhog robot is a 1,500 pound custom-built vehicle equipped with 
onboard computing, laser range sensing, gas and sinkage sensors, and video recording 
equipment. The robot has been built to map abandoned mines. 


4. Finally, the EIF algorihtm does not consider negative information. In practice, 
not seeing a feature can be as informative as seeing one. However, the simple 
formulaiton above does not perform the necessary geometric computations. 


In practice, whether or not we can exploit negative information depends on the 
nature of the sensor model, and the model of our features in the world. For 
example, we might have to compute probabilities of occlusion, which might be 
tricky for certain type sensors (e.g., range and bearing sensors for landmarks). 
However, contemporary implementations indeed consider negative information, 
but often by replacing proper probabiliustic calculations through approximations. 
One such example will be given in the next section. 


11.7 EMPIRICAL IMPLEMENTATION 


In the remainder of tihs chapter, we will highlight empirical results for an EIF SLAM 
implementation. The vehicle used in our experiment is Figure 11.1; it is a robot de- 
signed to map abandoned mines. 


The type map collected by the robot is shown in Figure 11.2. This map is an occu- 
pancy grid map, using effectively pairwise scan matching for recovering the robot's 
poses. Pairwise scan matching can be thought of as a version of EIF SLAM, but cor- 
respondence is only established between immediately consecutive scans. The result 
of this approach leads to an obvious deficiency of the map shown in Figure 11.2. 
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Figure 11.2 Map if a mine, acquired by pairwise scan matching. The diameter of this 
environment is approximately 250 meters. The map is obvisouly inconsistent, in that several 
hallways show up more than once. 
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Figure 11.3 Map if a mine, acquired by pairwise scan matching. The diameter of this 
environment is approximately 250 meters. The map is obvisouly inconsistent, in that several 
hallways show up more than once. 


To apply the EIF SLAM algorithm, our software decomposes the map into small local 
submaps, one for each five meters of robot travel. Within these five meters, the maps 
are sufficiently accurate, as general drift is small and hence the scan matching data 
association technique performs essentially flawlessly. Each submap's coordinates be- 
come a pose node in the EIF SLAM. Adjacent submaps are linked through the relative 
motion constraints between them. The resulting structure is shown in Figure 11.3. 
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Next, we apply the recursive data association search. The correspondence test is now 
implemented using a correlation analysis for two overlaying maps, and the Gaussian 
matching constraints are recovered by approximating this match function through a 
Gaussian. Figure 11.4 illustrates the process of data association: The circles each cor- 
respond to a new constraint that is imposed when contracting the information form of 
the EIF. This figure illustrates the iterative nature of the search: Certain correspon- 
dences are only discovered when others have been propagated, and others are dis- 
solved in the process of the search. The final model is stable, in that additional search 
for new data association induces no further changes. Displayed as a grid map, it yields 
the 2-D map shown in Figure 11.5. While this map is far from being perfect (largely 
due to a crude implementation of the local map matching constraints), it nevertheless 
is superior to the one found through incremental scan matching. 
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Figure 11.4 Data association search. 
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Figure 11.5 Final map, after optimizing for data associations. 
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11.8 SUMMARY 


This chapter introduced the extended information filter (EIF) approach to to the full 
SLAM problem. 


m the EIF SLAM algorithm addresses the full SLAM problem. It calculates pos- 
teriors over the full robot path along with the map. Therefore, EIF SLAM is a 
batch algorithm, not an online algorithm like EKF SLAM. 


m The key insight of the EIF SLAM algorithm is that the structure of information 
is sparse. Specifically, 


— Measurements provide information of a feature relative to the robot's pose 
at the time of measurement. In information space, they form constraints 
between these pairs of variables. 


— Similarly, motion provides information between two subsequent poses. In 
information space, each motion command forms a constraint between sub- 
sequent pose variables. 


EIF SLAM simply records all this information, through links that are defined 
between poses and features, and pairs of subsequent poses. However, this infor- 
mation representation does not provide estimates of the map, or the robot path. 


m The EIF SLAM algorithm recovers maps through an iterative procedure which 
involves three steps: Construction of a linear information form through Taylor 
extension; reduction of this form; and solving the resulting optimization problem. 
These three steps effectively resolve the information, and produce a consistent 
probabilistic posterior over the path and the map. Since EIF SLAM is run batch, 
we can repeat the linearization step and achieve gradually improved results. 


m Data association in EIF SLAM is performed by calculating the probability that 
two features have identical world coordinates. Since EIF SLAM is a batch al- 
gorithm, this can be done for any pair of features, at any time. This led to an 
iterative greedy search algorithm over all data association variables, which recur- 
sively identifies pairs of features in the map that likely correspond. 


m Practical implementations of the EIF SLAM often use additional tricks, to keep 
the computation low and to avoid false data associations. Specifically, practical 
implementations tend to reduce the data complexity by extracting local maps and 
using each map as the basic entity; they tend to match multiple features at-a-time, 
and they tend to consider negative information in the data association. 


m We briefly provided results for a variant of EIF SLAM that follows the decom- 
position idea, and that uses occupancy grid maps for representing sets of range 
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scans. Despite these approximations, we find that the EIF data association and 
inference techniques yield favorable results in a large-scale mapping problem. 
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12 


THE SPARSE EXTENDED 
INFORMATION FILTER 


12.1 INTRODUCTION 


The previous two chapters covered two ends of a spectrum of SLAM algorithms.EKF 
SLAM is proactive. Every time information is acquired, it resolves this information 
into a probability distribution. This step is computationally expensive: for n features 
in the map, it requires O(n?) computations per update. The EIF SLAM algorithm is 
different: It simply accumulates information. Such an accumulation is /azy: At the 
time of data acquisition, EIF SLAM simply memorizes the information is receives. To 
turn the accumulated information into a map, EIF SLAM had to perform inference. In 
EIF SLAM, this inference is performed after all data is acquired. This makes EIF an 
offline algortihm. 


This raises the question as to whether we can devise an online SLAM algorithm which 
inherits the efficiency of the information representation. The answer is yes, but only 
with a number of approximations. The sparse extended information filter, or SEIF, im- 
plements an information solution to the online SLAM problem. Just like the EKF, the 
SEIF integrates out past robot poses, and only maintains a posterior over the present 
robot pose and the map. But like EIFs and unlike EKFs, the SEIF maintains an in- 
formation representation of all knowledge. In doing so, updating the SEIF becomes 
a lazy information shifting operation; which is superior to the proactive probabilistic 
update of the ЕКЕ. Thus, SEIFs can be seen as the best of both worlds: They can be 
run online, and they are efficient. 
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А robot features 
link active passive 


normalized information matrix 


Figure 12.1 Illustration of the network of features generated by our approach. Shown on 
the left is a sparse information matrix, and on the right a map in which entities are linked 
whose information matrix element is non-zero. As argued in the paper, the fact that not all 
features are connected is a key structural element of the SLAM problem, and at the heart of 
our constant time solution. 


As an online algorithm, the SEIF maintains a belief over the very same state vector as 
the EKF: 


йе = tr (12.1) 


Here x; is the robot state, and m the map. The posterior under known correspondences 
is given by p(y | 21:4, u1:5 C14). This posterior differs from the one estimated by EIFs 
in that EIFs recovr the entire robot path. 


The key insight for turning EIFs into an online SLAM algorithm is illustrated in Fig- 
ure ??. This figure shows the result of the ЕКЕ SLAM algorithm, in a simulated 
environment containing 50 landmarks. The left panel shows a moving robot, along 
with its probabilistic estimate of the location of all 50 point features. The central 
information maintained by the EKF SLAM is a covariance matrix of these different 
estimates. The correlation, which is the normalized covariance, is visualized in the 
center panel of this figure. Each of the two axes lists the robot pose (location and 
orientation) followed by the 2-D locations of all 50 landmarks. Dark entries indicate 
strong correlations. We already discussed in the EKF SLAM chapter that in the limit, 
all feature coordinates become fully correlated—hence the checker-board appearance 
of the correlation matrix. 
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The right panel of Figure ?? shows the information matrix Q+, normalized just like the 
correlation matrix. As in the previous chapter, elements in this normalized information 
matrix can be thought of as constraints, or links, which constrain the relative locations 
of pairs of features in the map: The darker an entry in the display, the stronger the link. 
As this depiction suggests, the normalized information matrix appears to be sparse: 
it is dominated by a small number of strong links; and it possesses a large number of 
links whose values, when normalized, are near zero. Furthermore, the strength of each 
link is related to the distance of the corresponding features: Strong links are found only 
between metrically nearby features. The more distant two features, the weaker their 
link. This sparseness is distinctly different from that in the previous chapter: First, 
there exist strong links between pairs of landmarks. In the previous chapter, no such 
links could exist. Second, the sparseness is only approximate: In fact, all elements of 
the normalized information matrix are non-zero, but nearly all of them are very close 
to zero. 


The SEIF SLAM algorithm exploits this insight by maintaining a sparse information 
matrix, in which only nearby features are linked through a non-zero element. The 
resulting network structure is illustrated in the right panel of Figure 12.1, where disks 
correspond to point features and dashed arcs to links, as specified in the information 
matrix visualized on the left. This diagram also shows the robot, which is linked 
to a small subset of all features only. Those features are called active features and 
are drawn in black. Storing a sparse information matrix requires space linear in the 
number of features in the map. More importantly, all essential updates in SEIF SLAM 
can be performed in constant time, regardless of the number of features in the map. 
This result is somewhat surprising, as a naive implementation of motion updates in 
information filters—a perhaps in Table ?? on page 60—requires inversion of the entire 
information matrix. 


The SEIF is an online SLAM algorithm that maintains such a sparse information ma- 
trix, and for which the time required for all update steps is independent of the size of 
the map. This makes SEIF the first efficiet online algorithm for the SLAM problem, 
encountered in this book. 


12.2 INTUITIVE DESCRIPTION 


We begin with an intuitive description of the SEIF update, using grphical illustrations. 
Specifically, a SEIF update is composed of 4 steps: a mogtion update step, a measure- 
ment update step, a sparsificaiton step, and a state estimation step. 
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Figure 12.2 The effect of measurements on the information matrix and the associated 
network of features: (a) Observing yı results in a modifi cation of the information matrix 
elements (25, 4, . (b) Similarly, observing y» affects Qz,,y.. Both updates can be carried 
out in constant time. 
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Figure 12.3 Тһе effect of motion on the information matrix and the associated network 
of features: (a) before motion, and (b) after motion. If motion is non-deterministic, motion 
updates introduce new links (or reinforce existing links) between any two active features, 
while weakening the links between the robot and those features. This step introduces links 
between pairs of features. 
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Figure 12.4 Sparsifi cation: A feature is deactivated by eliminating its link to the robot. 
To compensate for this change in information state, links between active features and/or the 
robot are also updated. The entire operation can be performed in constant time. 
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We egin with the measurement update step, depicted in Figure 12.2. Each of the two 
panels show the information matrix maintained by the SEIF, along with the graph 
defined by the information links. In the measurement update, sensing a feature yı 
leads the SEIF to update the off-diagonal element of its information matrix that links 
the robot pose estimate x; to the observed feature yı. This is illustrate din the left 
panel of Figure 12.2a. Similarly, sensing y2 leads it to update the elements in the 
information matrix that link the robot pose x; and the feature y2, as illustrated in 
Figure 12.2b. As we shall see, each of these updates correspond to local additions in 
the information matrix (and the information vector). In both cases (information matrix 
and vector), this addition touches only elements that link the robot pose variable to the 
observed feature. This makes the measurement update noticeably efficient than the 
corresponding update steo in EKFs. In particular, the complexity of updating a SEIF 
in response to a measurement takes time independent of the size of the map. 


The motion update is shown in Figure 12.3. Here a robot's pose changes; Figure 12.3a 
depicts a the information state before, and Figure 12.3b after motion, respectively. 
The motion affects the information state in multiple ways. First, the links between the 
robot’s pose and the features 11, yo are weakened. This is a result of the fact that robot 
motion is erroneous, hence causes us to lose information about where the robot is 
relative to features in the map. However, this information is not entirely lost. Some of 
it is mapped into information links between pairs of features. This shift of information 
comes about since even though we lost information on the robot pose, we did not 
lose information on the relative location of features in the map. Whereas previously, 
those features were linked indirectly through the robot pose, they are now linked also 
directly after the update step. 


The shift of information from robot pose links to between-feature links is a key el- 
ement of the SEIF. In particular, the EIF discussed in the previous chapter never in- 
troduced any links between pairs of features. It is a direct consequence of using the 
information form as a filter, for the online SLAM problem. By integrating out past 
pose variables, we lose those links, and they are mapped back into the between-feature 
elements in the information matrix. 


For a pair of features to acquire a direct link in this process, both have to be active 
before the update, that is, their corresponding elements that links them to the robot 
pose in the information matrix have to be non-zero. This is illustrated in Figure 12.3: 
A between-feature link is only introduced between features y; and y2. Feature y3, 
which is not active, remains untouched. This suggests that by controlling the number 
of active landmarks at any point in time, we can control the computational complexity 
of the motion update, and the number of links in the information matrix. If the number 
of active links remains small, so will the update complexity for the motion update, and 
so will the number of non-zero between-landmark elements in the information matrix. 
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The information matrix in SLAM problem is approximately sparse. As we have argued 
above, the information matrix is dominated by a small number of elements between 
nearby features in the world, and links between features further away are compar- 
atively small. However, the sparseness is only approximate; it is not exact. SEIF 
therefore employs a sparsification step, illustrated in Figure 12.4. The sparsification 
involves the removal of a link between the robot and an active feature, effectively 
turning the active feature into a passive one. In SEIFs, this arc removal leads to a 
redistribution of information into neighboring links, specifically between other active 
features and the robot pose, and the time to perform sparsification is independent of 
the size of the map. However, it is an approximation, one that induces an information 
loss in the robot's posterior. The benefit of this approximation is that it induces true 
sparseness, and hence makes it possible to update the filter efficiently. 


There exists one final step in the SEIF algorithm, which is not depicted in any of the 
figures. This step involves the propagation of a mean estimate through the graph. 
As was already discussed in Chapter 3, the extended information filter requires an 
estimate of the state u+ for linearization of the motion and the measurement model. 
SEIFs also require a state estimate for the sparsification step. 


Clearly, one could recover the state estimate through the equation и = 071, where 
Q is the information vector, and € the information state. However, this would require 
inverting a large matrix, which would render SEIFs computationally inefficient. SEIFs 
circumvent the step by an iterative algorithm that effectively propagates state estimates 
through the information graph. Each local state estimate is updated based on the best 
estimates of its neighbors in the information graph. This iterative algorithm converges 
to the true mean u. Since the information form is sparse in SEIFs, each such update 
requires constant time, though with the caveat that more than a finite number of such 
updates may be needed to achieve good results. To keep the computation independent 
of the size of the state space, SEIFs perform a fixed number of such updates at any 
iteration. The resulting state vector is only an approximation, which is used instead of 
the correct mean estimate in all updating steps. 


12.3 THE SEIF SLAM ALGORITHM 


The outer loop of the SEIF update is depicted in Table 12.1. The algorithm accepts as 
input an information matrix Q;_1, the information vector €;_ 1, and an estimate of the 
state 44-1. It also accepts a measurement z+, a control u+, and a correspondence vector 
ct. The output of the algorithm SEIF SLAM known correspondences is a new state 
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1: Algorithm SEIF SLAM known.correspondences(£,..; , Q, 4, 11, Ut, Zt, Ct): 
2: Er, Qr, dt = SEIF motion. update(£, 1, 0—1, ju 1, Ut) 
3: ш = SEIF update state estimate(£;, Ө, д) 
4: £c, Q, = SEIF measurement. update(£,, Qt, Ht, Zt, Cz) 
5: &,Q, = SEIF sparsification(£;, Q) 
6: return £r, Qi, pu 
Table 12.1 The Sparse Extended Information Filter algorithm for the SLAM Problem, 
here with known data association. 
1: Algorithm SEIF motion update(£, 4,6), 1, 1 4, 04): 
1 0 0 0...0 
2: ps 0 1 0 0...0 
0.0 1 0...0 
—— 
2N 
— sinJa i9 + z- sin(ue-1,0 + wA) 
3: б= F COS 411,0 — F cos(pa—1,0 + wi A) 
AN 
0 0 S COS Ш—1,0 — "m cos(p—1,0 + wi A) 
4: A=] 0 0 т sin 141,0 — m sin(1—1,0 + 4 At) 
0 0 0 
5 Ф, = ЕТ + А): — ДЕР, 
6: Mt = VI Q4 + Q4 V, + VI Q4 V, 
F: Ф, = Q, 4 + At 
8 kt = Ф, FT (Ry! + Fy Ф, FT)! Е, Ф, 
9: О, = Ф, — Kt 
10: & = 61+ (At — к) ш +0, FT б, 
11: fit =шщ-—1+ FY ô 
12: return £, Q, [it 


Table 12.2 The motion update in SEIFs. 
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Algorithm SEIF measurement update(£,, Q4, Ht, Zt, с): 


o, 0 0 
О, = 0 o 0 
0 0 o; 
for all observed features z; = (ті фі s;)" do 
j= & 
if landmark j never seen before | 
Ij Lt f eos(ót + uuo) 
Шу |= | bey | ore | вшщ(ф+ шиш) 
у, st 0 
endif 
oum зе | 
ду Нуу — Pty 
4= 55 
va 
2 = | atan2(d,, ôx) — Lt, 
Hj,s = 
fe —qóy. 0 0.--0 адд . 09, 0 
Hisl б, б -1 0.0 —-ф —б„ 0 
ка 0 0 0 0...0 0 0 1 
—— 
3j-3 
endfor 


&=&+У НГ Qr p - 4 - Hi u 
0 = 0, - Y, Hi? Qi Hi 
return £,, Qa 


Table 12.3 The measurement update step in SEIFs. 


The Sparse Extended Information Filter 311 


- 


Algorithm SEIF _sparsification(€;, Q+): 

2: defi ne Км, Fx,mo, Fx as projection matrices from y+ to mo, (x, mo}, 
and x, respectively 

3: Oe = Op P (ЕТ, R Em)! ELO 

t о? Е, то Иа н О? It ET s 99 

— О, Е, (FT QU Fo)! ЕТО, 


4: & = & + ш (Qe — б) 
return &, 0, 


Table 12.4 The sparsifi cation step in SEIFs. 


1: Algorithm SEIF update state estimate(£,, О,, д): 
2i for a small set of map features m; do 
0.0 1 0 0...0 
3: Е; = 05220 0 1 0-0 
2(N—i) 2(i—1)x 
4: ша = (Е; О, FI) F; |, — 9; +0, FT Е; д] 
5: endfor 
6: for all other map features m; do 
T: Mit = Шш 
8: endfor 
100 0...0 
9: Е, = 0 1 0 0...0 
0.0 1 0... 
м2 
2N 
10: Hat = (Е, 9, ET Р, [£; — О, fe + О, ЕТ F, ji] 
11: return [lt 


Table 12.5 The amortized state update step in SEIFs updates a small number of state 
estimates. 
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estimate, represented by the information matrix О, and the information vector £,. The 
algorithm also outputs an improved estimate uz. 


The SEIF update is implemented in 4 steps. The motion update in Table 12.2 incorpo- 
rates the control и, into the filter estimate. It does so through a number of operations, 
which ensure computational efficiency. Specifically, the only components of the in- 
formation vector/matrix that are modified in this update are those of the robot pose 
and the active features. The measurement update in Table 12.3 incorporates the mea- 
surement vector z; under known correspondence c;. This step is also local, just like 
the motion update step. It only updates the informaiton values of the robot pose and 
the observed features in the map. The sparsification step, shown in Table 12.4, is an 
approximate step: It removes active features by transforming the informaiton matrix 
and hte information vector accordingly. This step is again efficient; it only modifies 
links between the robot and the active landmarks. Finally, the state estimate update 
in Table 12.5, applies an amortized coordinate descent technique to recover the state 
estimate u+. This step once again exploits the sparseness of the SEIF, through which it 
only has to consult a small number of other state vector elements in each incremental 
update. 


Together, the entire update loop of the SEIF is constant time, in that the processing 
time is independent of the size of the map. This is at stark contrast to the ЕКЕ, which 
requires time quadraticin the size of the map for each update. However, this constant 
time" statement should be taken with a grain of salt: The recovery of state estimates is 
a computational problem for which no linear-time solution is presently known, if the 
environment possesses large cycles. Thus, the result of SEIF, when run as constant 
time filter, may degrade as the map diameter increases. We will revisit this issue at the 
very end of the paper. 


12.4 MATHEMATICAL DERIVATION 


As usual, the reader may skip the mathematical derivation of the SEIF, and proceed 
directly to Section 12.7 on page 323. 


12.4.1 Motion Update 


The motion update in SEIF processes the control u;. It does so by transforming the 
information matrix Q;—; and the information vector €,_; into a new matrix О, and 
vector £;, representing the belief at time t. As usual, the bar in our notation indicates 
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that this prediction is only based on the control; it does not yet take the measurement 
into account. 


The derivation of the motion update is best started with the corresponding formula 
for the EKF. We begin with the algorithm ЕКЕ SLAM known correspondences in 
Table 10.1, page 249. Lines 3 and 5 state the motion update, which we restate here for 
the reader's convenience: 


ш = Mit Fló (12.2) 
$3. = dx ОТ B^ В, Fy (12.3) 


The essential elements of this update were defined as follows: 


100 0...0 

F, = |010 00 (12.4) 
0.0 1 0...0 
= sin J—1,9 + = sin(1i—1,0 + 4 At) 

6 = 2 COS 1,9 — 56 COS(f4-1,9 + oA) (12.5) 
ш At 

0 0 с. COS J41—1,0 — me cos(pa—1,0 + wi A) 

А = 0 0 e sin 141,9 — a sin(4a 1,9 + 4 At) (12.6) 
0 0 

G, = 1+ЕТАЕ, (12.7) 


In SEIFs, we have to define the motion update over the information vector € and the 
information matrix Q. From Equation (12.3), the definition of С; in (12.7), and the 
information matrix equation О = X: !, it follows that 


v —1 
0, = [G Q GT + FT В, E,] 
[G+ FTA Fp) QG} 1+ ЕТА Е) ЕТ В, Е]. (12.8) 


A key insight is this update can be implemented in constant time—tregardless of the 
dimension of Q. The fact that this is possible for sparse matrices 24-1 is somewhat 
non-trivial, since Equation (12.8) seems to require two nested inversions of matrices 
of size (2n + 3) x (2n + 3). As we shall see, if 9,1 is sparse, this update step can 
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be carried out efficiently. We define 


[G Qz} GT] 
= [GI]! Qe Go. (12.9) 


Ф, 


and hence Equation (12.8) can be rewritten as 
О, = [Ф!+ЕТВ,Е,] | (12.10) 
We now apply the matrix inversion lemma and obtain: 


[® pr 
$,— Ф, РІ(В + F, Ф, FT)! Е, Ф, 
S 


= 
[ 


Kt 


= ð -k (12.11) 


Here к; is defined as indicated. This expression can be calculated in constant time if 
we can compute Ф, in constant time from (),. 1. To see that this is indeed possible, 
we note that the argument inside the inverse, Ry 8 Е. Ф, ЁТ , is 3-dimensional. 
Multiplying this inverse with F7 and F, induces a matrix that is of the same size as 
О; however, this matrix is only non-zero for the З x З sub-matrix corresponding to the 
robot pose. Multiplying this matrix with a sparse matrix О,_1 (left and right) touches 
only elements for which the off-diagonal element in Q¿—1 between the robot pose and 
a map feature is non-zero. Put differently, the result of this operation only touches 
rows and columns that correspond to active features in the map. Since sparsity implies 
that the number of active features in €, ., is independent of the size of 0,1, the 
total number of non-zero elements in к, is also O(1). Consequently, the subtraction 
requires O(1) time. 


It remains to be shown that we can calculate Ф, from €). у in constant time. We begin 
with a consideration of the inverse of G4, which is efficiently calculated as follows: 


Gri = (IX ET ARE! 
(Т-ЕТ I Fp + ЕТ I F, +F? Aen) 
p E ———— 4 


=0 
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= (I— FI I F,+ FI (I+4) Fy)! 

= I-EPIE, -FI(Q--A)'ED 

= IF [(-A)'U-IDE, 
ee SS 


V, 
= Js (12.12) 


By analogy, we get for the transpose [GT]? = (I+ FT AT F,)7! = I + VT. Неге 
the matrix V, is only non-zero for elements that correspond to the robot pose. It is 
Zero for all features in the map, and hence can be computed in constant time. This 
gives us for our desired matrix Ф, the following expression: 


Ф = (Т+ФТ)О,_,(Т+%Ф,) 
= 11t OP 11+ 0-1 Wt OF O, 4 V, 
^ 
= Qia tA (12.13) 


where WV, is zero except for the sub-matrix corresponding to the robot pose. We nore 
because Q—1, A, is zero except for a finite number of elements, which correspond to 
active map features and the robot pose. 


Hence, Ф, can be computed from €),.., in constant time, assuming that €). is sparse. 
Equations ... are equivalent to Lines ... in Table 10.1, which proves the correctness of 
the information matrix update in EKF SLAM known correspondences. 


Finally, we show a similar result for the information vector. Line From (12.2) we 
obtain 


jn = quac ЕС б, (12.14) 
This implies for the information vector: 


& = 0, (Oy &-1 + FI) 

= M OF), & 12-0, ЕТ б, 

= (Oa — Q4 + 9 - D) о £a - i FT б 
= (0, –-Ф,+%Ф, -Qui t9) OF), & 4-0, FT 6, 


=0 =0 
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= (0,-Ф,+Ф,- 0,41) Qt & 1 Q3 OF, &—1+0, FP & 
x. = м = i-i = 1 


&—1+ (Ae — Kt) ш—1+0, ЕТ à (12.15) 


Since A, and к; are both sparse, the product (А, — к;) ш—1 only contains finitely 
many non-zero elements and can be calculated in constant time. Further, Ё e б, isa 
sparse matrix. The sparseness of the product О, FT 6, follows now directly from the 
fact that О, is sparse as well. 


12.4.2 Measurement Updates 


The second important step of SLAM concerns the update of the filter in accordance 
to robot motion. The measurement update in SEIF directly implements the general 
extended information filter update, as stated in Lines 6 and 7 of Table 3.5 on page 60: 


0, Q+ HI QT) Н, (12.16) 
& = &+Н Qi [а h(i) — Н. ш] (12.17) 


Writing the prediction 2, = A(ji;) and summing over all individual elements in the 
measurement vector leads to the form in Lines 13 and 14 in Table 12.3: 


% = Qr Hi орні (12.18) 


& = А+У HI QU [1-8 - Hi ш] (12.19) 


Here Q+, ô, q, and Н} are defined as before (e.g., Table 11.2 on page 272). 


12.5 SPARSIFICATION 


12.5.1 General Idea 


The key step in SEIFs concerns the sparsification of the information matrix Q+. Be- 
cause sparsification is so essential to SEIFs, let us first discuss it in general terms be- 
fore we apply it to the information filter. Sparsification is an approximation whereby a 
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posterior distribution is approximated by two of its marginals. Suppose a, b, and c are 
sets of random variables (not to be confused with any other occurrence of these vari- 
ables in this book!), and suppose we are given a joint distribution p(a, b, c) over these 
variables. To sparsify this distribution, suppose we would like to remove any direct 
link between the variables a and b. In other words, we would like to approximate p 
by a distribution р for which the following property holds: p(a | b,c) = p(a | c) and 
Р | a, c) = рф | c). In multivariate Gaussians, it is easily shown that this conditional 
independence is equivalent to the absence of a direct link between a and b, that is, the 
corresponding element in the information matrix is zero. 


A good approximation p is obtained by a term proportional to the product of the 
marginals, p(a, c) and p(b, c). Neither of these marginals retain dependence between 
the variables a and b, since they both contain only one of those variables. Thus, the 
product p(a, c) p(b, c) does not contain any direct dependencies between a and b; in- 
stead, a and b are conditionally independent given c. However, p(a, c) p(b, c) is not yet 
a valid probability distribution over a, b, and c. This is because c occurs twice in this 
expression. However, proper normalization by p(c) yields a probability distribution 
(assuming p(c) > 0): 


pla,b,c) = ———72—— (12.20) 


p(a,b,c) = (6 


CBS аг ae 
= dd; bie) a e, (12.21) 


In other words, removing the direct dependence between a and b is equivalent to ap- 
proximating the conditional p(a | b, c) by a conditional p(a | c). We also note (without 
proof) that among all approximations q of p where a and b are conditionally indepen- 
dent given c, the one described here is “closest” to p, where closeness is measured 
by the Kullback Liebler divergence, a common (asymmetric) information-theoretic 
measure of the “nearness” of one probability distribution to another. 


An important observation pertains to the fact that the original p(a | b, c) is at least as 
informative as p(a | c), the conditional hat replaces p(a | b, с) in p. This is because 
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p(a | b,c) is conditioned on a superset of variables of the conditioning variables in 
p(a | c). For Gaussians, this implies that the variances of the approximation p(a | c) 
is equal or larger than the variance of the original conditional, p(a | b,c). Further, 
the variances of the marginals (a), p(b), and р(с) are also larger than or equal to the 
corresponding variances of p(a), p(b), and p(c). In other words, it is impossible that 
the variance shrinks under this approximation. 


12.5.2 Sparsifications in SEIFs 


The SEIF applies the idea of sparsification to the posterior p(y; | 21:4, 1:5 C14). 
thereby maintaining an information matrix О, that is sparse at all times. This sparse- 
ness is at the core of SEIF's efficiency. We already remarked that sparsification is an 
approximative step, since information matrices in SLAM are naturally not sparse— 
even though normalized information matrices tend to be almost sparse. In the context 
of SLAM, it suffices to deactivate links between the robot pose and individual features 
in the map; if done correctly, this also limits the number of links between pairs of 
features. 


To see, let us briefly consider the two circumstances under which a new link may be 
introduced. First, observing a passive feature activates this feature, that is, introduces 
a new link between the robot pose and the very feature. Second, motion introduces 
links between any two active features. This consideration suggests that controlling 
the number of active features can avoid violation of both sparseness bounds. Thus, 
sparseness is achieved simply by keeping the number of active features small at any 
point in time. 


To define the sparsification step, it will prove useful to partition the set of all features 
into three disjoint subsets: 


m = m* m? + тт (12.22) 


where m7 is the set of all active features that shall remain active. The set m? are one 
or more active features that we seek to deactivate (remove the link to the robot). And 
finally, m~ are all currently passive features; they shall remain passive in the process 
of sparsification. Since m* U m contains all currently active features, the posterior 
can be factored as follows: 


p(yc | 21:6, Чг, Crt) (12.23) 


= plar m, m* m^ | 21.8, Wit, Crt) 
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= p(t | m?,m*,m ‚гъа, Urt, сыз) p(mP,m* m^ | zia uias cia) 


LL р(х. | то, т, т = 0, 21:6, U1:t; C14) p(m?, m”, m | 2 Чы, C14) 


In the last step we exploited the fact that if we know the active features m° and 
m^, the variable x, does not depend on the passive features m~. We can hence 
set m~ to an arbitrary value without affecting the conditional posterior over тү, 
p(zi | m9, mt, MT, zya, шл, 014). Here we simply chose m^ = 0. 


Following the sparsification idea discussed in general terms in the previous section, 


we now replace p(x; | m°, m+, тт = 0) by р(х, | тт, тт = 0), that is, we drop 


the dependence on m”. 


B(zi,m | 214, Чы, C14) (12.24) 


= 0 = 
= px | mt, m` = 0, 21:4, Чыл, С) p(m m*,m | 21:4; Чаа, Crt) 
This approximation is obviously equivalent to the following expression: 


Pre M | yas Чы, Сыа) (12.25) 


p(z,,m* | m = 0, 21, Чыл, Crt) 0 


ГА -€ 
p(m т ‚т Zia; Ча Crt) 


p(m* | т = 0, Z1:t, 41:4, Crt) 


12.5.3 Mathematical Derivation 


In the remainder of this section, we show that the algorithm SEIF sparsification 
in Table 12.4 implements this probabilistic calculation, and that it does so in con- 
stant time. We begin by calculating the information matrix for the distribution 
p(z,, m?,m* | m^ = 0) of all variables but m~, and conditioned on m^ = 0. 
This is obtained by extracting the sub-matrix of all state variables but m~: 


|? EN T T 
9; d Pant mo Ich 9; Pont mo E ned 


(12.26) 


With that, the matrix inversion lemma (Table 3.2 on page 44) leads to the following 


information matrices for the terms p(z;,m* | m^ = 0,214,u14, C14) and р(тт | 
MT = 0,214, чал, 1:1), denoted Q} and O2, respectively: 
1 _ o0 0 T оо -1 pT Q0 
Q = 9-0, Fa, (Fo, Qt Fimo) Б, % 


Оо? = -Ol ES CE O Erm) LI 900 (12.27) 


zx.mo zx.mo 
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Here the various F’-matrices are projection matrices that project the full state y; into 
the appropriate sub-state containing only a subset of all variables—in analogy to the 
matrix F, used in various previous algorithms. The final term in our approximation 
(12.25), p(m9, тт, m^ | zya, U1:t, C14), possesses the following information matrix: 


QU OOF Oey FTO (12.28) 


Putting these expressions together according to Equation (12.25) yields the following 
information matrix, in which the feature m is now indeed deactivated: 
О, = cL: 07 (12.29) 
= wV Fm (ЕТО? Рм) FA О? 
+ о? Е, то (BF no Q? Pom ЕТ us Q? 


=O; Fy (ЕТ О, Fa)! FE О 


The resulting information vector is now obtained by the following simple considera- 
tion: 


& = Q, Ht 

(О, — О, + 0,) Le 

= Ош + (О, — 9) ш 

= & + (0. — %) ш (12.30) 


This completes the derivation of Lines 3 and 4 in Table 12.4. 


12.6 AMORTIZED APPROXIMATE MAP 
RECOVERY 


The final update step in SEIFs is concerned with the computation of the mean p (since 
the consideration in this section does not depend on the time index t, it is simply 
omitted for brevity). Before deriving an algorithm for recovering the state estimate и 
from the information form, let us briefly consider what parts of и are needed in SEIFs, 
and when. SEIFs need the state estimate и of the robot pose and the active features in 
the map. These estimates are needed at three different occasions: 
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1. The mean is used for the linearization of the motion model, which takes place in 
Lines 3, 4, and 10 in Table 12.2. 


2. It is also used for linearization of the measurement update, see Lines 6, 8, 10, 13 
in Table 12.3. 


3. Finally, it is used in the sparsification step, specifically in Line 4 in Table 12.4. 


However, we never need the full vector u. We only need an estimate of the robot pose, 
and an estimate of the locations of all active features. This is a small subset of all 
state variables in и. Nevertheless, computing these estimates efficiently requires some 
additional mathematics, as the exact approach for recovering the mean via pp = Q^! € 
requires matrix inversion—even when recovering a subset of variables only. 


Once again, the key insight is derived from the sparseness of the matrix О. In particular 
the sparseness enables us do define an iterative algorithm that enables us to recover 
state variables online, as the data is being gathered and the estimates © and О are 
being constructed. To do so, it will prove convenient to reformulate и = (27! £ as an 
optimization problem. As we will show in just a minute, the state и is the mode 


Ёй = argmax p(s) (12.31) 
H 


of the following Gaussian distribution, defined over the variable и: 
р(и) = mexp(-iuTO uc£T au) (12.32) 


Here џ is a vector of the same form and dimensionality as p. To see that this is indeed 
the case, we note that the derivative of p( u) vanishes at и = QT! £: 


ЧАР (Ош) ехр{-1 Rutu} = 0 (12.33) 


which implies О u = € or, equivalently, и = Q^! £. 


This suggests that recovering the state vector u is equivalent to finding the mode of 
(12.32). Thus, it transforms a matrix inversion problem into an optimization problem. 
For this optimization problem, we will now describe an iterative hill climbing algo- 
rithm which, thanks to the sparseness of the information matrix, requires only constant 
time per optimization update. 


322 CHAPTER 12 


Our approach is an instantiation of coordinate descent. For simplicity, we state it here 
for a single coordinate only; our implementation iterates a constant number K of such 
optimizations after each measurement update step. The mode 2 of (12.32) is attained 
at: 


5 
| 


= argmax exp (-3 uT Q u -- £T m 
H 


= argmin iu Qu- £T p (12.34) 
H 


We note that the argument of the min-operator in (12.34) can be written in a form that 
makes the individual coordinate variables р; (for the i-th coordinate of u+) explicit: 


iu OE = iM ущ Ош У (12.35) 
a j a 


where Q; j is the element with coordinates (i, j) in the matrix О, and £; if the i-th 
component of the vector €. Taking the derivative of this expression with respect to an 
arbitrary coordinate variable ш; gives us 


д 
Bu Уу Уу ш 55 i - mp = у uyu (12.36) 
i j i j 


Setting this to zero leads to the optimum of the 7-th coordinate variable и; given all 
other estimates uj: 


ш = Ор} |&—)у Орун; (12.37) 
2% 


The same expression сап conveniently be written in matrix notation. Here we define 
F,=(0...0 1 0...0) to bea projection matrix for extracting the i-th component 
from the matrix 0: 


ш = (ROFD'HÀli-90pu-90F7 F; u] (12.38) 
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This consideration derives our incremental update algorithm. By repeatedly updating 
ш —— (ROE) 1K (€-Qut+QFP F; y (12.39) 


for some element of the state vector и; reduces the error between the left-hand side 
and the right-hand side of Equation (12.38). Repeating this update indefinitely for all 
elements of the state vector converges to the correct mean (without proof). 


As is easily seen, the number of elements in the summation in (12.37), and hence the 
vector multiplication in the update rule (12.39), is constant if Q is sparse. Hence, each 
update requires constant time. To maintain the constant-time property of our SLAM 
algorithm, we can afford a constant number of updates K per time step. This will 
generally not lead to convergence, but the relaxation process takes place over multiple 
time steps, resulting in small errors in the overall estimate. 


However, a note of caution is in order. The quality of this approximation depends 
on a number of factors, among them the size of the largest cyclic structure in the 
map. In general, a constant number of K updates might be insufficient to yield good 
results. How to best update џ is presently an open question. Also, there exists а 
number of optimization techniques that are more efficient than the coordinate descent 
algorithm described here. A “classical” example is conjugate gradeitn. In practical 
implementations it is advisable to rely on efficient optimization techniques to recover 


I. 


12.7 HOW SPARSE SHOULD SEIFS BE? 


To determine the degree of sparseness one is willing to accommodate in SEIFs, one 
should compare a sparse SEIF to a SEIF without the sparsification step and with 
closed-form map recovery. Such a SEIF is functionally equvalent to the EKF SLAM 
alorithm. 


The following comparison characterizes the three key performance measures that set 
sparse SEIFs apart from EKFs. Our comparison is based on a simulated robot world, 
in which the robot senses the range, proximity, and identity of nearby landmarks, 


1. Computation. Figures 12.6 compares the computation per update in SEIFs with 
that in EKFs; in both cases the implementation is optimized. This graph illus- 
trates the major computational ramification of the probabilistic versus informa- 
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landmarks: 50 links: 1275 


landmarks: 50 links: 448. 


Figure 12.5 Comparison of SEIF without sparsifi cation (top row) with SEIF with 4 active 
landmarks (bottom row)m in a simulated environment with 50 landmarks. In each row, 
the left panel shows the set of links in the fi Iter; the center panel the correlation matrix; and 
the right panel the noramlized information matrix. Obviously, the sparsifi ed SEIF maintains 
many fewer links, but its result is less confi dent as indicated by its less-expressed correlation 
matrix. 


tion representation in the filter. While EKFs indeed require time quadratic in the 
map size, SEIFs level off and require constant time. . 


2. Memory. Figure 12.7 compares the memory use of EKFs with that of SEIFs. 
Here once again, EKFs scale quadratically, whereas SEIFs scale linearly, due to 
the sparseness of its information representation. 


3. Accuracy. Here EKFs outperform SEIFs, due to the fact that SEIFS require ap- 
proximation for maintaining sparseness, and when recovering the state estimate 
Ht. This is shown in Figure 12.8, which plots the error of both methods as a 
function of map size. 


The number of acitve features in SEIFs determines the degree of sparseness. This 
effective trades off two factors: the computational efficiency of the SEIF, and the 
accuracy of the result. When implementing a SEIF algorithm, it is therefore advisable 
to get a feeling for this trade-off. 
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Figure 12.6 The comparison of average CPU time between SEIF and ЕКЕ. 
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Figure 12.7 Тһе comparison of average memory usage between SEIF and ЕКЕ. 
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Figure 12.8 The comparison of root mean square distance error between SEIF and ЕКЕ. 
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Figure 12.9 The update time of the EKF (leftmost data point only) and the SEIF, for 


different degrees of sparseness, as induced by a bound on the number of active features as 
indicated. 
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Figure 12.10 The approximation error EKF (leftmost data point only) and SEIF for dif- 
ferent degrees of sparseness. In both fi gures, the map consists of 50 landmarks. 
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One way to get a feeling for the effect of the degree of sparseness can be obtained via 
simulation. Figure 12.9 plots the update time and the approximation error as a func- 
tion of the number of active landmarks in the SEIF update, for a map consisting of 50 
landmarks. The update time falls monotonically with the number of active features. 
Figure 12.10 shows the corresponding plot for the error, comparing the EKF with 
the SEIF at different degrees of sparseness. The solid line is the SEIF as described, 
whereas the dashed line corresponds to a SEIF that recovers the mean p; exactly, via 
matrix inversion and multiplication. As this plot suggests, 6 active features seem to 
provide competitive results, at significant computational savings over the EKF. For 
smaller numbers of active features, the error increases drastically. A careful imple- 
mentation of SEIFs will require the experimento to vary this important parameter, and 
graph its effect on key factors as done here. 


328 CHAPTER 12 


12.8 INCREMENTAL DATA 
ASSOCIATION 


We will now turn our attention to the problem of data association in SEIFs. Our first 
technique will be the familiar incremental approach, which greedily identifies the most 
liekly correspondence, and then treat this value as if it was ground truth. We already 
encountered an instance of such a greedy data association technique in Chapter 10.3, 
where we discussed data association in the EKF. In fact, the only difference between 
greedy incremental data associaiton in SEIFs and EKSs pertains to the calculation of 
the data association probability. As a rule of thumb, computing this probability is 
generally more difficult in an information filter than in a probabilistic filter such as the 
ЕКЕ, since the information filter does not keep track of covariances. 


12.8.1 Computing Data Association 
Probabilities 


As before, the data association vector at time will be denoted с;. The greedy incre- 
mental technique maintains a set of data assiciation guesses, denoted ¢,.,. At time t, 
we are already given from previous updates a set б1.,—1. The data associaton step then 
pertains to the estimation of the most likely value for the data association variable 6; 
at time t. This is achieved via the following maximum likelihood estimator: 


C= argmax p(2¢ | 214—1, Чыл, €14—1, Ct) 
Ct 

= argmax | p(z | Yt, сь) P(Y | Z1:t—1, Чыл, 614-1) dyt 
Ct 


9,62 


= argmax | f ple | Ж, Yous Ct) P( Les Yo, | Z1:t-15 Uii C141) (12.40) 


Our notation p(zz | £t, Ye,, Ct) of the sensor model makes the correspondence variable 
c, explicit. Calculating this probability exactly is not possible in constant time, since 
it involves marginalizing out almost all variables in the map (which requires the inver- 
sion of a large matrix). However, the same type of approximation that was essential 
for the efficient sparsification can also be applied here as well. 


In particular, let us denote by т} the combined Markov blanket of the robot pose x+ 
and the landmark уг,. This Markov blanket is the set of all features in the map that 
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Figure 12.11 The combined Markov blanket of feature yn and the obsrved features is 
usually suffi cient for approximating the posterior probability of the feature locations, con- 
ditioning away all other features. 


are linked to the robot of landmark y.,. Figure 12.11 illustrates this set. Notice that 
m includes by definition all active landmarks. The spareness of О, ensures that mo 
contains only a fixed number of features, regardless of the size of the тар Л. If the 
Markov blanket of x; and of y.,, further features are added that represent the shortest 
path in the information graph between x, and of ye,. 


All remaining features will now be collectively referred to as mz, that is: 


m. = m-mj-íya) (12.41) 


Ct 


The set m}, contains only features which have only a minor impact on the tar- 
get variables, x, and y,,. Our approach approximates the probability p(x;, Ye, | 
Z14—1, Uist, €14—1) in Equation (12.40) by essentially ignoring these indirect influ- 
ences: 


p(zi, Че, | Z1:t—15 01:1, 6141) 


e: + m- ё 2 dmz 
= ШЕ | 214—1, 81:55 ёге) йты, йты, 


ШЕ | ma ma zai ue ёле) pim? | m5 , 21:21, Utt был—1) 
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p(mz, | ziu—1, Urt, $141) dmi, dmz, (12.42) 
ду + = у ^ 
ды Eo | Me» TR, = Io Z1:t—1, Чы, был—1) 


F ATEOS P + 
p(m;. | Me, = Hei Z1:t—1; U1:t; €14—1) dm, 


This probability can be computed in constant time if the set of features considered in 
this calculation is independent of the map size (which it generally is). In complete 
analogy to various derivations above, we note that the approximation of the poste- 
rior is simply obtained by carving out the submatrix corresponding to the two target 
variables: 


= F T —1 
Xue, 22d T asit: (2 ры mz, 9, Erstellt) Кае, 
Htc, = Mtl as уе, (12.43) 


This calculation is constant time, since it involves a matrix whose size is independent 
of N. From this Gaussian, the desired measurement probability in Equation (12.40) is 
now easily recovered, as described in Section ??. 


As in our EKF SLAM algorithm, features are labeled as new when the likelihood 
р(& | z14—1, 01:0, C1:4-1, с) remains below a threshold a. We then simply set б, = 
С. + 1 and С, = Сп + 1. Otherwise the size of the map remains unchanged, 
that is, C, = Cj, , and the value ё; is chosen that maximizes the data association 
probebility. 


As last caveat, sometimes the combines Markov blanket is insufficient, in that it does 
not contain a path between the robot pose and the landmark that is being tested for 
corresponence. This will usually xbe the case when closing a large cycle in the envi- 
ronment. Here we need to augment the set of features má by a set of landmarks along 
at least one path between me, and the robot pose x+. Depending on the size of thie 
cycle, the numbers of landmarks contained in the resulting set may now depend on N, 
the size of the map. We leave the details of such an extension as an exercise. 


12.8.2 Practical Considerations 


In general, the incremental greedy data association technique is brittle. Spurious mea- 
surements can easily cause false associations, and induce significant errors into the 
SLAM estimate. 
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Figure12.12 The vehicle used in our experiments is equipped with a 2D laser range fi nder 
and a differential GPS system. The vehicle's ego-motion is measured by a linear variable 
differential transformer sensor for the steering, and a wheel-mounted velocity encoder. In 
the background, the Victoria Park test environment can be seen. 


The standard approach—in EKFs and SEIFs alike—-pertains to the creation of a can- 
didate list. For any detected object that can not be explained by existing features, a 
new feature candidate is generated but not put into SEIF directly. Instead it is added 
into the candidate list with a weight representing its probability of being a useful fea- 
ture. In the next measurement step, the newly arrived candidates are checked against 
all candidates in the waiting list; reasonable matches increase the weight of corre- 
sponding candidates. Candidates that are not matched lose weight because they are 
more likely to be a moving object. When a candidate has its weight above a certain 
threshold, it joins the SEIF network of features. 


We notice that data association violates the constant time property of SEIFs. This is 
because when calculating data associations, multiple features have to be tested. If we 
can ensure that all plausible features are already connected in the SEIF by a short path 
to the set of active features, it would be feasible to perform data association in constant 
time. In this way, the SEIF structure naturally facilitates the search of the most likely 
feature given a measurement. However, this is not the case when closing a cycle for 
the first time, in which case the correct association might be far away in the SEIF 
adjacency graph. 


The remainder of this section describes an implementation of the SEIF algorithm us- 
ing a physical vehicle. The data used here is a common benchmark in the SLAM 
field [11, 23, 27]. This data set was collected with an instrumented outdoor vehicle 
driven through a park in Sydney, Australia. 
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Figure 12.13 The testing environment: A 350 meters by 350 meters patch in Victoria 
Park in Sydney. Overlayed is the integrated path from odometry readings. 


Figure 12.14 The path recovered by the SEIF, is correct within +1m. 


The Sparse Extended Information Filter 333 


200 
150 


100 


North(meter) 


-50 


-100 


-150 
-200 -150 -100  -50 0 50 100 150 200 


East(meter) 


Figure 12.15 Overlay of estimated landmark positions and robot path. 


The vehicle and its environment are shown in Figures 12.12 and 12.13, respectively. 
The robot is equipped with a SICK laser range finder and a system for measuring steer- 
ing angle and forward velocity. The laser is used to detect trees in the environment, 
but it also picks up hundreds of spurious features such as corners of moving cars on 
a nearby highway. The raw odometry, as used in our experiments, is poor, resulting 
in several hundred meters of error when used for path integration along the vehicle's 
3.5km path. This is illustrated in Figure 12.13, which shows the path of the vehicle. 
The poor quality of the odometry information along with the presence of many spuri- 
ous features make this dataset particularly amenable for testing SLAM algorithms. 


The path recovered by the SEIF is shown in Figure 12.14. This path is quantitatively 
indistinguishable from the one produced by the EKF. The average position error, as 
measured through differential GPS, is smaller than 0.50 meters, which is small com- 
pared to the overall path length of 3.5 km. Comparing with EKF, SEIF runs approxi- 
mately twice as fast and consumes less than a quarter of the memory EKF uses. 
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Step t — 62 


Step t — 65 | Step t = 85 


Step t = 89 


Step t = 500 


Figure 12.16 Snapshots from our multi-robot SLAM simulation at different points in 
time. Initially, the poses of the vehicles are known. During Steps 62 through 64, vehi- 
cle 1 and 2 traverse the same area for the first time; as a result, the uncertainty in their 
local maps shrinks. Later, in steps 85 through 89, vehicle 2 observes the same landmarks 
as vehicle 3, with a similar effect on the overall uncertainty. After 500 steps, all landmarks 
are accurately localized. 
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12.9 TREE-BASED DATA ASSOCIATION 


SEIFs make it possible to define a radically different data association approach, which 
can be proven to yield the optimal results (although possibly in exponential time). The 
technique is built on three key insights: 


m Just like the EIF studied in the previous chapter, SEIFs make is possible to add 
"soft" data association constraints. Given two features m; and mij, a soft data 
association constraints is nothing else but an information link that forces the dis- 
tance between m; and m; to be small. We already encountered examples of such 
soft links in the previous chapter. In sparse extended information filters, intro- 
ducing such a link is a simple, local addition of values in the information matrix. 


m We can also easily remove soft asociation constraints. Just as introducing a new 
constraints amounts to a local addition in the information matrix, removing it is 
nothing else but a a local subtraction. Such an “undo” operation can be applied 
to arbitrary data association links, regardless when they were added, or when the 
respective feature was last observed. This makes it possible to revise past data 
association decisions. 


m The ability to freely add and subtract data associations arbitrarily enables us to 
search the tree of possible data associations in a way that is both efficient and 
complete—as will be shown below. 


To develop tree-based data association, it shall prove useful to consider the data asso- 
ciation tree that is defined over sequences of data association decisions over time. At 
each point in time, each observed feature can be associated with a number of other fea- 
tures, or considered a new, previously unobserved feature. The resulting tree of data 
association choices, starting at time t = 1 all the way to the present time, is illustrated 
in Figure 12.17a. Of course, the tree grows exponentially over time, hence searching 
it exhaustively is impossible. The greedy approach described in the previous section, 
in contrast, follows one path through this tree, defined by the locally most likely data 
associations. Such a path is visualized in Figure 12.17a as the thick gray path. 


Obviously, if the incremental greedy approach succeeds, the resulting path is optimal. 
However, the incremental greedy technique may fail. Once a wrong choice has been 
made, the incremental approach cannot recover. Moreover, wrong data association 
decisions introduce errors in the map which, subsequently, can induce more errors in 
the data association. 


The approach discussed in the remainder of this chapter generalizes the incremental 
greedy algorithm into a full-blown search algorithm for the tree that is provably op- 
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timal. Of course, following all branches in the tree is intractable. However, if we 
maintain the log-likelihood of all nodes on the fringe of the tree extracted thus far, 
we can guarantee optimality. Figure 12.17b illustrates this idea: The tree-based SEIF 
maintains not just a single path through the data association tree, but an entire fron- 
tier. Every time a node is expanded (e.g., through incremental ML), all alternative 
outcomes are also assessed and the corresponding likelihoods are memorized. This is 
illustrated in Figure 12.17b, which depicts the log-likelihood for an entire frontier of 
the tree. 


Finding the maximum in Equation (12.40) implies that the log-likelihood of the chosen 
leaf is greater or equal to that of any other leaf at the same depth. Since the log- 
likelihood decreases monotonically with the depth of the tree, we can guarantee that 
we indeed found the optimal data association values when the log-likelihood of the 
chosen leaf is greater or equal to the log-likelihood of any other node on the frontier. 
Put differently, when a frontier node assumes a log-likelihood greater than the one of 
the chosen leaf, there might be an opportunity to further increase the likelihood of the 
data by revising past data association decisions. Our approach then simply expands 
such frontier nodes. If an expansion reaches a leaf, this leaf is chosen as the new 
data association; otherwise the search is terminated when the entire frontier possesses 
values that are all smaller or equal to the one of the chosen leaf. This approach is 
guaranteed to always maintain the best set of values for the data association variables; 
however, occasionally it might require substantial search. 


12.9.1 Calculating Data Association 
Probaiblities 


We have now, on numerous occasions, discussed techniques for calculating data as- 
sociation probabilities. In our case we need a technique which, for any two features 
in the map, calculates the probability of equality. This is different from previous fil- 
ter techniques, in which we only considered the most recent measurements, but the 
mathematical derivation is essentially the same. 


Table 12.6 lists an algorithm that tests the probability that two features in the map 
are one and the same— this test is sufficient to implement the greedy data association. 
The key calculation here pertains to the recovery of a joint covariance and mean vector 
over a small set of map elements В. To determine whether two features in the map 
are identical, we need to consider the information links between them. Technically, 
the more links are included in this consideration, the more accrate the result, but at 
the expense of increased compuation. In practice, it usually suffices to identify the 
two Karkov blankets of the features in question. A Markov blanket of a feature is the 
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Figure12.17 (a) The data association tree, whose branching factor grows with the number 
of landmarks in the map. The tree-based SEIF maintains the log-likelihood for the entire 
frontier of expanded nodes, enabling it to fi nd alternative paths. (c) Improved path. 
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1: Algorithm SEIF correspondence (е5(0, £, ij, mj, my): 
1: let B(j) be the blaket of m; 
1: let B(k) be the blaket of mj. 
1: В = В(ј) U B(k) 
1: if B(j) п B(k) = 0 
1: add features along the shortest path between m; and m; to В 
1: endif 
0...0 1 O0 0...0 
0...0 0 1 0...0 TA 
0...0 1 0 0...0 
1 Ев = 0...0 0 1 0...0 
0...0 
2 0...0 
1: (size (2N + 3) by 2|В|) 
1: Eg = (Fg Q FẸ)! 
1: ив = Ув Ев & 
0...0 1 0 0.0 —1 O 0...0 
1: Ед = | 0-0 0 1 0...0 0 —1 0...0 
feature m; feature m; 
1: Ул = (FA Q FF)! 
1: ua = Уд Ел € 
1 
1: return det(27 X4) 2 ехр{—5 wk УХ! ua) 


Table 12.6 The SEIF SLAM test for correspondence. 


feature itself, and all other features that are connected via a non-zero element in the 
information matrix. In most cases, both Markov blankets intersect; if they do not, the 
algoithm in Table 12.6 identifies a path between the landmarks (which must exis if 
both were observed by the same robot). 


The algotihm in Table 12.6 then proceeds by cutting out a local information matrix 
and informaiton vector, employing the very same mathamtical “trick” thta led to an 
efficient sparsification step: we condition away features outside the Markov blankets. 
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As a result, we obtain an efficient technique for calculating the desired probability, one 
that is approximate (because of the conditioning), but works very well in practoce. 


The result is interesting in two dimensions. First, as before, it les us make a data asso- 
ciation decision. But second, it provides the likelihood of this decision. The logarithm 
of the result of this procedure corresponds to the log-likelihood of this specific data 
item, and summing those up along the path in the data associaiton tree becomes the 
total dta log-likelihood under a specific association. 


12.9.2 Tree Search 


The tree-based data association technique uses a search procedure for considering 
alternative data association decisions not just at the present time step, but also for time 
steps in the past. A simple argument (reminiscent of that underlying the correctness of 
the A* algorithm [32]) enables us to drastically reduce the number of nodes expended 
during this search. Figure 12.17b illustrates the basic idea: Our approach maintains 
not just a single path through the data association tree, but an entire frontier. Every 
time a node is expanded (e.g., through incremental ML), all alternative outcomes are 
also assessed and the corresponding likelihoods are memorized. This is illustrated 
in Figure 12.17b, which depicts the log-likelihood for an entire frontier of the tree. 
Notice that we chose to represent the likelihood values as log-likelihoods, which is 
numerically more stable than probabilities. 


The goal of the tree search is to maximize the overall data likelihood 


Ĉit =  argmax p(z140 | u14, а) (12.44) 


Сї: 


Notice the difference to Equation (12.40), which only maximizes the most recent data 
association. Finding the maximum in Equation (12.44) implies that the log-likelihood 
of the chosen leaf is greater or equal to that of any other leaf at the same depth. Since 
the log-likelihood decreases monotonically with the depth of the tree, we can guaran- 
tee that we indeed found the optimal data association values when the log-likelihood 
of the chosen leaf is greater or equal to the log-likelihood of any other node on the 
frontier. Put differently, when a frontier node assumes a log-likelihood greater than 
the one of the chosen leaf, there might be an opportunity to further increase the likeli- 
hood of the data by revising past data association decisions. Our approach then simply 
expands such frontier nodes. If an expansion reaches a leaf, this leaf is chosen as the 
new data association; otherwise the search is terminated when the entire frontier pos- 
sesses values that are all smaller or equal to the one of the chosen leaf. This approach is 
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guaranteed to always maintain the best set of values for the data association variables; 
however, occasionally it might require substantial search. 


12.9.3 Equivalency Constraints 


Once two features in the map have determined to be equivalent in the data association 
search, we have to add a soft link. Suppose the first feature is m; and the second is m;. 
The soft link constraints their position to be equal through the following exponential- 
quadratic constraint 


exp (73 (m; = mj)" € (m; = m;)] (12.45) 


Here C is a diagonal penalty matrix. The larger the elements on the diagonal of C, 
the stronger the constraint. It is easily seen that this non-normalized Gaussian can be 
written as a link between m; and m, in the information matrix. Simplyu define the 
projection matrix 


0.010 0-0 —1 0 0-0 
Pipes ex | бм ШЕЛ. 0250 10 eb 0260 (12.46) 


This matrix mapes the state y; to the difference m; — тту. Thus, the expression (12.45) 
becomes 


exp ics (Fmi;-m; ye) C (Fimm y)} = Ve B C Р, -т(А7) 


Thus, to implement this soft constraint, we have to add EL cus C Fmi—m; to the 
information matrix, while leaving the information vector unchanged: 


О, <  Q FEL aC Fn; (12.48) 


Clearly, the additive term is sparse: it only contains non-zero off-diagonal elements 
between the features m; and mj. Once a soft link has been added, it can be removed 
by the inverse operation 


О, < Mh- FL, C Fmi-m; (12.49) 
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Figure 12.18 (a) Mine map with incremental ML scan matching and (b) using our lazy 
data association approach. The map is approximately 250 meters wide, and acquired with- 
out odometry information. 


In SEIFs, This removal can occur even if the constraint was added at a previous time 
step. However, careful bookkeeping is necessary to guarantee that we never remove a 
non-existend data associaiton constraints—otherwise the information matrix may no 
longer be positive semidefinite, and the resulting belief might not correspond to a valid 
probability distribution. 


12.9.4 Practical Considerations 


In any competitive implementation of this approach, there will usually only a small 
number of data association paths that are plausible at any point in time. When closing 
a loop in an indoor environment, for example, there are usually at most three plausible 
hypothesis: a closure, a continuation on the left, and a continuation on the right. AII 
but all quickly should become unlikely, so the number of times in which the tree is 
searched recursively should be small. 


One way to make the data association succeed more often is to incorporate negative 
measurement information. Negative information pertains to situations where a robot 
fails to see a measurement. Range sensors, which are brought to bear in our imple- 
mentation, return positive and negative information with regards to the presence of 
objects in the world. The positive information are object detections. The negative 
information applies to the space between the detection and the sensor. The fact that 
the robot failed to detect an object closer than its actual reading provides information 
about the absence of an object within the measurement range. 
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Figure 12.19 (a) Log-likelihood of the actual measurement, as a function of time. The 
lower likelihood is caused by the wrong assignment. (b) Log-likelihood, when recursively 
fi xing false data association hypotheses through the tree search. The success is manifested 
by the lack of a distinct dip. 
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Figure 12.20 Example of the tree-based data association technique: (a) When closing а 
large loop, the robot fi rst erroneously assumes the existence of a second, parallel hallway. 
However, this model leads to a gross inconsistency as the robot encounters a corridor at a 
right angle. At this point, the approach recursively searches for improved data association 
decisions, arriving on the map shown in diagram (b). 


An approach that evaluate the effect of a new constraint on the overall likelihood con- 
siders both types of information: positive and negative. Both types are obtained by 
calculating the pairwise (mis)match of two scans under their pose estimate. When 
using range scanners, one way to obtain a combination of positive and negative infor- 
mation is by superimposing a scan onto a local occupancy grid map build by another 
scan. In doing so, it is straightforward to determine the probability of a measurement 
in a way that incorporates both the positive and the negative information. 


The remainder of this section highlights practical results achieved using SEIFs with 
tree-based data association. The left panel of Figure 12.18a depicts the result of incre- 
mental ML data association, which is equivalent to regular incremental scan matching. 
Clearly, certain corridors are represented doubly in this map, illustrating the shortcom- 
ings of the ML approach. The right panel, in comparison, shows the result. Clearly, 
this map is more accurate than the one generated by the incremental ML approach. Its 
diameter is approximately 250 meters wide, and the floor of the mine is highly uneven. 
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Figure 12.21 (а) Path of the robot. (b) Incremental ML (scan matching) (c) FastSLAM. 
(d) Our approach. 


Figure 12.19a illustrates the log-likelihood of the most recent measurement (not the 
entire path), which drops significantly as the map becomes inconsistent. At this point, 
the SEIF engages in searching alternative data association values. It quickly finds the 
"correct" one and produces the map shown in Figure 12.18b. The area in question is 
shown in Figure 12.20, illustrating the moment at which the likelihood takes its dip. 
The log-likelihood of the measurement is shown in Figure 12.19b. 
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12.10 MULTI-VEHICLE SLAM 


The SEIF is also applicable to multi-robot SLAM problem. The multi-robot SLAM 
problem involves several robots that independently explore and map an enviroment, 
with the eventual goal of integrating their maps into a single, monolithic map? This 
problem raises two key questions: First, how can robot teams establish correspondence 
between individual robot maps. Second, once correspondence has been established, 
what are the mechanics of integrating maps? 


12.10.1 Fusing Maps Acquired by Multiple 
Robots 


Let OF, £F and Q7 ; ¿l two local estimates (maps and vehicle poses) acquired by two 
different vehicles, k and j. To fuse these maps, we need two pieces of information: a 
relative coordinate transformation between these two maps (translation and rotation), 
and a correspondence list, that is, a list of pairs of landmarks that correspond to each 
other in the different maps. 


Suppose we are given the translation d and the rotation matrix r that specify the co- 
ordinate transformation from the j-th to the k-th robot's coordinate system—we will 
discuss our approach to finding d and r further below. Coordinates y in the j-th robot's 
coordinate system are mapped into the k-th coordinate system via the linear equation 
y^ = ry + d. This transformation is easily extended to the filter variables (07, 2) 


ОР? = ВТО Б (12.50) 
ef = (8 +0} DT) RT (12.51) 


where R and D are matrices that extend r and d to the full dimension of the posterior 
maintained by the 7-th robot: 


1 0 0 а 
O0 mr -- 0 d 

[p QUUM ME ad D zm. (12.52) 
б 4 ier ow d 


Notice the special provision for the robot’s heading direction, which is the very first 
element in the state vector. The heading simply changes by the angle of the rotation 
between both maps, denoted a in (12.52). 
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Figure 12.22 Eight local maps obtained by splitting the data into eight disjoint sequences. 


Figure 12.23 the multi-robot result, obtained using the algorithm described in this paper. 
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To see the correctness of (12.50) and (12.51), we recall that the parameters (07, 21) 


define a Gaussian over the 7-th robot pose ans map хі Ex хі Y )T. This gives us 
the following derivation: 


р(х} | 74, О?) 
ox exp - i(R xi-D- rud Ser (Ех -р-– ш) 
a exp-3xp R Xt R xi — (ai +D)" Xv ^ RT xi 
= exp-ixiT" RT QÍ R xi – (+0) DT) RT xi (12.53) 


The key observations here are that the alignment takes time linear in the state vector 
(and not cubic as would be the case for EKFs). More importantly, the sparseness is 
preserved by this update step. The reader may also notice that the transformation can 
be applied to subsets of features (e.g., a local map), thanks to the sparseness of Оў. In 
such a case, one has to include the Markov blanket of the variables of interest. 


After the alignment step, both maps are expressed in the same coordinate system. The 
joint information state is obtained by concatenating both information matrices and 
both information states. The correspondence list is then incorporated into this joint 
map by collapsing the corresponding rows and columns of the resulting information 
matrix and vector. The following example illustrates the operation of collapsing fea- 
ture 2 and 4 in the filter, which would occur when our correspondence list states that 
landmark 2 and 4 are identical: 


hj hi» hig hj 


hor hog һоз h hu hio hi4 his 
n ho A ho. ==: hoi--hai Һәә -һаә-ҺодА-Һад — ha34- 0.54) 
€ 4 h h Xe ^ 
hai ha» h43 h44 31 32 34 33 
: А 
== |е (12.55) 
&з & 
£4 


Collapsing the information state exploits the additivity of the information state. The 
viability of a data fusion hypothesis is finally assessed by computing the likelihood 
of the data after fusing two maps. This is achieved by plugging the fused map into 
the original two Gaussians defining each vehicles’ local maps, and by multiply the 
resulting probabilities. This calculation plays an important role in our algorithm's 
decision to accept or reject a fusion hypothesis. Technically, this operation involves 
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a recovery of the state, and the evaluation of two Gaussians (one per robot). The 
mathematics for doing so are straightforward and omitted for brevity. 


12.10.2 Establishing Correspondence 


The data association between multiple robots is mathematically identical to the prob- 
lem with a single robot relative to its map, with the important difference than in multi- 
robot SLAM, the correspondence problem is global, that is, any pairs of features in 
two robots’ maps can correspond. This makes global correspondence computatinally 
expensive. 


The previous section provided a method for evaluating the goodness of a map fusion 
candidate, but left open how such candidates are found. Finding good candidates for 
fusing maps is essentially a hybrid search problem, involving continuous (alignment) 
and discrete (correspondence) variables. 


Our approach performs this search in two states. First, it searches for corresponding 
pairs of local landmark configurations in different maps. In particular, our approach 
identifies for each landmark in each map all triplets of three adjacent landmarks that 
fall within a small radius (a similar grouping was used in [39]). The relative distances 
and angles in these triplets are then memorized in an SR-tree, to facilitate efficient 
retrieval. Using these SR-trees, similar local configurations are easily identified in 
different maps by searching the tree. Correspondences found in this search serve as a 
starting hypotheses for the fusion process; they are also used to recover the alignment 
parameters, the rotation r and the shift d (using the obvious geometric laws). 


When an appropriate candidate has been identified, the space of possible data associa- 
tions is searched recursively, by assuming and un-assuming correspondences between 
landmarks in the different maps. The search is guided by two opposing principles: 
The reduction of the overall map likelihood that comes from equating two landmarks, 
and the increase in likelihood that results from the fact that if there were really two 
separate landmarks, both robots should have detected both of them (and not just one). 
To calculate the latter, we employ a sensor model that characterizes “negative” infor- 
mation (not seeing a landmark). 


In general, the search for the optimal correspondence is NP-hard. However, in all our 
experiments with real-world data we found hill climbing to be successful in every sin- 
gle instance. Hill climbing is extremely efficient; we suspect it to be in O(N log N) 
for maps of size N. In essence, it associates nearby landmarks if, as a result, the over- 
all likelihood increases. Once the search has terminated, a fusion is finally accepted 
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if the resulting reduction of the overall likelihood (in logarithmic form) is offset by 
the number of collapsed landmarks times a constant; this effectively implements a 
Bayesian MAP estimator with an exponential prior over the number of landmarks in 
the world. 


In our implementation, the robots are informed of their initial pose. This is a common 
assumption in multi-robot SLAM, necessary for the type linearization that is applied 
both in EKFs and SEIFs [30]. Recent work that enables vehicles to build joint maps 
without initial knowledge of their relative pose can be found in [12, 37, 41]. 


Our simulation involves a team of three air vehicles. The vehicles are not equipped 
with GPS; hence they accrue positioning error over time. Figure 12.16 shows the joint 
map at different stages of the simulation. As in [30], we assume that the vehicles com- 
municate updates of their information matrices and vectors, enabling them to generate 
a single, joint map. As argued there, the information form provides the important ad- 
vantage over EKFs that communication can be delayed arbitrarily, which overcomes a 
need for tight synchronization inherent to the EKF. This characteristic arises directly 
from the fact that the information matrix О, and the information vector £; in SEIFs 
is additive, whereas covariance matrices аге not. In particular, let (Q£, £?) be the pos- 
terior of the i-th vehicle. Assuming that all posteriors are expressed over the same 
coordinate system and that each map uses the same numbering for all landmarks, the 
joint posterior integrating all of these local maps is given by (У), 02, У), &/). This ad- 
ditive nature of the information form is well-known, and has in the context of SLAM 
previously been exploited by Nettleton and colleagues [30]. SEIFs offer over the work 
in [30] that the messages sent between vehicles are small, sue to the sparse nature 
of the information form. A related approach for generating small messages in multi- 
vehicle SLAM has recently been described in [29]. 


Figure 12.16 shows a sequence of snapshots of the multi-vehicle system, using 3 dif- 
ferent air vehicles. Initially, the vehicle start our in different areas, and the combined 
map (illustrated by the uncertainty ellipses) consists of three disjoint regions. During 
Steps 62 through 64, the top two vehicles discover identical landmarks; as a result, 
the overall uncertainty of their respective map region decreases; This illustrates that 
the SEIF indeed maintains the correlations in the individual landmark’s uncertainties; 
albeit using a sparse information matrix instead of the covariance matrix. Similarly, 
in steps 85 through 89, the third vehicle begins to identical landmarks also seen by 
another vehicle. Again, the resulting uncertainty of the entire map is reduced, as can 
be seen easily. The last panel in Figure 12.16 shows the final map, obtained after 
500 iterations. This example shows that SEIFs are well-suited for multi-robot SLAM, 
assuming that the initial poses of the vehicles are known. 
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12.11 DISCUSSION 


This paper proposed an efficient algorithm for the SLAM problem. Our approach is 
based on the well-known information form of the extended Kalman filter. Based on 
the empirical observation that the information matrix is dominated by a small number 
of entries that are found only between nearby features in the map, we have developed 
a sparse extended information filter, or SEIF. This filter enforces a sparse information 
matrix, which can be updated in constant time. In the linear SLAM case with known 
data association, all updates can be performed in constant time; in the nonlinear case, 
additional state estimates are needed that are not part of the regular information form 
of the EKF. We proposed a amortized constant-time coordinate descent algorithm for 
recovering these state estimates from the information form. We also proposed an effi- 
cient algorithm for data association in SEIFs that requires logarithmic time, assuming 
that the search for nearby features is implemented by an efficient search tree. The 
approach has been implemented and compared to the EKF solution. Overall, we find 
that SEIFs produce results that differ only marginally from that of the EKFs, yet at a 
much improved computational speed. Given the computational advantages of SEIFs 
over EKFs, we believe that SEIFs should be a viable alternative to ЕКЕ solutions when 
building high-dimensional maps. 


SEIFS, are represented here, possess a number of critical limitations that warrant future 
research. First and foremost, SEIFs may easily become overconfident, a property 
often referred to as inconsistent [20, 15]. The overconfidence mainly arises from the 
approximation in the sparsification step. Such overconfidence is not necessarily an 
problem for the convergence of the approach [23], but it may introduce errors in the 
data association process. In practice, we did not find the overconfidence to affect the 
result in any noticeable way; however, it is relatively easy to construct situations in 
which it leads to arbitrary errors in the data association process. 


Another open question concerns the speed at which the amortized map recovery con- 
verges. Clearly, the map is needed for a number of steps; errors in the map may 
therefore affect the overall estimation result. Again, our real-world experiments show 
no sign of noticeable degradation, but a small error increase was noted in one of our 
simulated experiments. 


Finally, SEIF inherits a number of limitations from the common literature on SLAM. 
Among those are the use of Taylor expansion for linearization, which can cause the 
map to diverge; the static world assumption which makes the approach inapplicable 
to modeling moving objects [43]; the inability to maintain multiple data association 
hypotheses, which makes the approach brittle in the presence of ambiguous features; 
the reliance on features, or landmarks; and the requirement that the initial pose be 
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known in the multi-robot implementation. Virtually all of these limitations have been 
addressed in the recent literature. For example, a recent line of research has devised 
efficient particle filtering techniques [13, 23, 26] that address most of these shortcom- 
ings. The issues addressed in this paper are somewhat orthogonal to these limitations, 
and it appears feasible to combine efficient particle filter sampling with SEIFs. We 
also note that in a recent implementation, a new lazy data association methodology 
was developed that uses a SEIF-style information matrix to robustly generate maps 
with hundreds of meters in diameter [40]. 


The use of sparse matrices in SLAM offers a number of important insights into the de- 
sign of SLAM algorithms. Our approach puts a new perspective on the rich literature 
on hierarchical mapping discussed further above. As in SEIFs, these techniques focus 
updates on a subset of all features, to gain computational efficiency. SEIFs, however, 
composes submaps dynamically, whereas past work relied on the definition of static 
submaps. We conjecture that our sparse network structures capture the natural depen- 
dencies in SLAM problems much better than static submap decompositions, and in 
turn lead to more accurate results. They also avoid problems that frequently occur at 
the boundary of submaps, where the estimation can become unstable. However, the 
verification of these claims will be subject to future research. A related paper dis- 
cusses the application of constant time techniques to information exchange problems 
in multi-robot SLAM [28]. 


Finally, we note that our work sheds some fresh light on the ongoing discussion on the 
relation of topological and metric maps, a topic that has been widely investigated in the 
cognitive mapping community [6, 17]. Links in SEIFs capture relative information, 
in that they relate the location of one landmark to another (see also [7, 9, 31]). This 
is a common characteristic of topological map representations [5, 35, 18, 22]. SEIFs 
also offer a sound method for recovering absolute locations and affiliated posteriors 
for arbitrary submaps based on these links, of the type commonly found in metric map 
representations [25, 36]. Thus, SEIFs bring together aspects of both paradigms, by 
defining simple computational operations for changing relative to absolute represen- 
tations, and vice versa. 


[x] 


However, a note of caution is in order. If SEIFs were applied to a linear system (mean- 
ing, we don't need Taylor series approximations), the update would be truly constant 
time. However, because of the need to linearize, we need an estimate of the mean 
H along with the information state. This estimate is not maintained in the traditional 
information filter, and recovering it requires a certain amount of time. Our SEIF im- 
plementation only approximates it, and the quality of the posterior estimate depends 
on the quality of this approximation. We will return to this issue at the very end of 
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this chapter, where we discuss some of the shortcomings and extensions of the SEIF 
paradigm. For now, we will begin with a derivation of the four essential update steps 
in SEIFs. 


[...] 


However, to attain efficient online, updating the SEIF has to make a number of ap- 
proximations, which make its result less accurate than that of the EIF. In particular, 
the SEIF has two limitations: First, it linearizes only once, just like the EKF. The 
EIF can re-linearize, which generally improves the accuracy of the result. Second, the 
SEIF uses an approximation step to main sparsity of its information matrix. This spar- 
sity was naturally given for the full-SLAM EIF algorithm, by nature of the information 
that was being integrated. SEIFs integrate out past poses, and this very step violates 
the sparseness of the information representation. The SEIF uses an approximation step 
to retain sparseness, which is essential for efficient updating of the information matrix 
in the context of online SLAM. 
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13 


MAPPING WITH UNKNOWN 
DATA ASSOCIATION 


13.1 LATEST DERIVATION 


This is the correct derivation of EM, unfortunately using a slightly different notation 
than in the rest of the book. It sits here to remind Sebastian that it has to be incorpo- 
rated into the text and parts of it have to ultimately disappear into the appendix. So, 
reader, don't read this. If you do this anyhow, we use s; for the pose at time t and st 
for the poses leading up to time t. z is a measurement, and u is a control. Also, the 
control for the interval (t — 1, ¢] is denoted uz, and not и; 1. Good luck! 


Here it goes... 


p(d'sm) = p(d'|s',m) p(s'|m) 
n | [»Gu15s-. 5) [ [76-157 m) (13.1) 


Taking the logarithm on both sides gives us 


log р(а°, зт) = % + S 1овр(и, |, 87-1) + У 108р(2. |, т) (13.2) 


T. T 


Here ņ and 7’ are constants. Introduce binary fields of indicator variables Г,, and 
Г, в. 1: ls, = lif and only if the robot’s pose at time t was s,, and /„ ,, = lif 
and only if the robot’s pose at time t was s, and the pose at time t — 1 was s+—1. The 
set of all of those variable will be called 7. They are the latent variables. This gives us 
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the form: 
(13.3) 


log p(d', I|m) 
i € Y | f 1« s logplulsr sei) ds, dei Y, ја, log р(2. |57, m) 48 


Now calculate the expectations of those indicator variables I: 
(13.4) 


E; [log p(d', Дт) 
т + ы log р(и.|8т, $41) ds; ds; 1 + X log p( z^ |57, m) ds, 


Il 


Er 
т+У | | Et... Mogptucl sr) ds; ds +$ | Et. Vogpslsss m) ds, 


where the expectation of the indicator variables J is conditioned on the current map, 
тій. In other words, we have (1 am not sure it's good to even write it in this form): 


(13.5) 


Est [log p(d*, s*|m)] 
= +} | f Persei, 24!) log р(иг |8, 57—1) ds, ds,_1 


+} етй, 2, ut) los pss [sss m) ds 
Thus, in the E-step we have to calculate the expectations 


I; = р(ѕ|тії, z*, ш!) 
р(8+,ву—1|тЇ1, zt, и) (13.6) 


з 

3 

ъ 

3 

| 

5а 
Il 


for all t, s}, апа s,..,. In the M-step, we seek to maximize (13.5) under these epec- 
tations, which will give us the (7 + 1)-st map. Fortunately, not all the terms in (13.5) 
depend on the map m: 


т = amgmaxY] / І, log pz. |s,, m) ds; (13.7) 
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In retrospect, we notice that we don't even have to calculate the indicator variables 
I,,.,5, ,. Relative to the map т, the term is a constant. So we can omit them early on. 


This gives us for the E-step, assuming uniform prior over robot poses (here 20" = 
(2425 2т+2,... A) 


I = p(s.mH,zt uf) 
= np, u Im, s,,27,u") г Z^ yu) 
= np(zV uv mH, s.) p(s,|mEl, z7,u7) 
2 п! р(ѕ„ |ті, z^, ш") pz’, Pm il) p(s. |mPl, 27, шт) 
= 1" p(s, |ті, 2%", ш") р(в,|тЇї, 27, шт) (13.8) 


which is a forwards-backwards localization. 


For M-step, we have: 
т = argmaxY] / Г, log p(z,|s,., m) ds; (13.9) 


Conveniently decouple this for each location (2, у). Notice, this is approximate, as 
those cells aren't independent. An alternative would be to run Simulated Annealing. 
But decoupling common in the literature. 


mE = чанах | 1. log p(z, s. , Ms y) ds; (13.10) 


Mgr, y 


We observe that 


login pP(Me,y|sr, 27) Р(®+ |в+)]| 
= m -logp(mz;,|s-,2-) + log p(zz|s-) (13.11) 


log p(zr|87, Mz,y) 


and hence 


mitt = oma: f 1 „Пов p(mz,y|$7, zr) + log p(z-|s-)] ds; 


Mgr, y 


356 CHAPTER 13 


= argmax > | 1. log p(Mz,y|S7, Zr) ds- (13.12) 


Great, eh? The only thing I don't like is the M-step. If we can fix that, we might 
actually prove convergence! 


АП right, I let you read the motivation now. Don't pay attention to any of the deriva- 
tions. 


13.2 MOTIVATION 


In the previous chapter, we discussed a mapping algorithm that calculates the full, joint 
posterior over poses and maps. From a probabilistic standpoint of view, the estimation 
of the posterior is clearly the gold standard, as it carries information about the map 
uncertainty, the pose uncertainty, and even the cross-dependencies between pose and 
map estimates. 


However, to obtain a posterior estimation algorithm, the approach described in the pre- 
vious chapter had to resort to a number of restrictive assumptions. The most important 
of those assumptions is the absence of a data association problem. Put differently, it is 
critical for these algorithms that they can can establish correspondence between sights 
of the same environment feature at different points in time—otherwise the posterior 
would not be multi-modal!. For this reason, mapping algorithms using Kalman fil- 
ters often extract from the sensor data a small number of landmarks (features) which 
can be disambiguated easily. By doing so, they are forced to throw away most of the 
data. For example, in the underwater mapping results mentioned in the previous chap- 
ter, most of the measurements of the underwater sea bed were discarded, and only a 
small number of feature detections were used for building a map. How much better 
a map could one build by considering all available data! These observation motivates 
our desire to develop mapping algorithms that can cope with ambiguous sensor data, 
and with the data association problem. Such algorithms are subject of this and the 
following chapter. 


This chapter presents a family of mapping algorithms that specifically address the 
data association problem. In particular, they can cope with ambiguous landmarks and 
estimate the correspondence between them on-the-fly. Variants of the basic algorithm 
can also cope with raw sensor data, by compiling sequences of data into small local 
maps, and merging them in a statistical sound way into a global, consistent map. On 
the downside, the algorithms developed here have two disadvantages when compared 
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to the Kalman filter approach. First, instead of computing the full posterior over maps 
and poses, they only strive for the most likely map. Consequently, the result of this 
algorithm does not explicitly represent the uncertainty in the map or the robot poses. 
Second, the algorithms discussed here are batch algorithms. They require multiple 
passes through the entire data set. Kalman filters, in contrast, are incremental. Once a 
measurement has been processed, it does not have to be memorized. 


Nevertheless, the algorithms discussed here are important from a practical point of 
view. The data association problem is generally recognized as the most challenging 
problem in concurrent mapping and localization, and these algorithms are currently 
the only known approach that addresses this problem in a statistically sound way. The 
algorithms have also generated some of the largest maps ever built, in environments 
that are intrinsically difficult to map. 


This chapter contains some statistical material which is slightly more advanced than 
the material in the previous chapters. In particular, it uses (without proof) Dempster's 
EM algorithm [], which is a well-known statistical technique for maximum likelihood 
estimation with missing data. To communicate the basic ideas first before diving into 
mathematical formalism, this chapter begins with an informal description of the basic 
ideas underlying all of the algorithms discussed in this chapter. In particular, 


m Section 13.3 introduces the basic ideas using two simple, artificial examples. 


m Section 13.4 states the basic EM algorithm for concurrent mapping and localiza- 
tion, and derives it mathematically. 


m Section 13.5 discusses how implement EM. In particular, it discusses the imple- 
mentation of a landmark-based mapping algorithm, which uses piecewise con- 
stant approximation of probability density functions, and points out strategies of 
reducing the computational complexity. 


m Section 13.6 introduces a layered version of this algorithm, which relies on local 
maps as the basic entities for EM. As is discussed there, this layered version 
makes it possible to apply EM to occupancy grids-style metric maps. 
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Figure 13.1 Environment (thin line); the robot's path (grey); and the measured path (thick 
line). The robot can perceive whether or not it is in one of the indistinguishable corners, 
within a certain maximum perceptual range. The dots indicate locations where the robot 
perceived a corner. 


13.3 MAPPING WITH EM: THE BASIC 
IDEA 


Let us begin by discussing the basic ideas and concepts in mapping with EM. As we 
will see, the general idea are quite intuitive and, like everything else in this book, 
critically linked to the notion of probabilities. 


The goal of EM mapping is to find the most likely map given the data. Put mathemat- 
ically, we are now interested in calculating 


argmax P(m | d) (13.13) 


for arbitrary data sets and maps. Unfortunately, searching the space of maps exhaus- 
tively is impossible, since it is too large. EM therefore performs hill climbing in the 
space of all maps. More specifically, it starts out with a very poor map, then develops 
a better map, and so on, until it finally arrives at a local maximum in likelihood space. 


In particular, and this is the basic "trick" of EM, EM iterates two different estimation 
steps 


ш  alocalization step, also called E-step for reasons given below, in which the robot 
localizes itself in its current best map, and 


ш a mapping stape, also called M-step, where a new, improved map is calculated 
based on the result of localization. 
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Figure 13.2 The initial alpha-values, using only odometry to compute posterior distribu- 
tions over the robot's poses. 


Clearly, this problem decomposition has the advantage that the steps individually are 
much easier than the concurrent In Chapter 7, we have already seen how to localize 
robots in a known map (the E-step). In Chapter 9, we have discussed techniques for 
building maps from known poses (M-step). The EM approach alternates localization 
and mapping, using techniques that bear similarity to the ones described before. 


To illustrate EM, consider the rectangular environment shown in Figure 13.1. Let us 
for the moment assume that the robot has a sensor for detecting corners. Whenever 
it is near or in a corner, it can perceive its relative location, but it is unable to tell 
the corners apart. Notice that this environment is highly symmetric. The mapping 
problem clearly raises the data association problem of establishing correspondence 
between different detections of the same corner. Figure 13.1 also shows the robot's 
path (in gray). Dots along the path indicate when the robot's sensors detected a corner. 
Obviously, our robot tends to drift to right, relative to its odometry measurements. 


The EM algorithm starts with a localization step without any map. Figure 13.2 shows 
this for the different points in time. Initially, pose defined to be (2 = 0, y = 0,0 = 0), 
which is shown in the first (and leftmost) diagram in Figure 13.2. After the first motion 
segment to the adjacent corner, robot's belief shown in second diagram. This belief 
is purely based on odometry; since there is no map, no opportunity to constrain the 
belief in any way. The next three diagrams show robot's belief after the next three 
motion segment, illustrating that the robot's uncertainty increases over time. 


The pose estimates are now used to generate a first, highly uncertain map. This map 
is shown in Figure 13.3. Here the grey-level corresponds to the likelihood that a x- 
y-location in the map is a corner: Black means the robot is certain there is a corner, 
white means the robot is certain there is none, and values in between indicate prob- 
abilities for cornerness between О (white) and 1 (black). The map in Figure 13.3 
possesses several remarkable properties. First, it is highly uncertain, as indicated by 
the large regions that might possible be corners. The grey-ish circle at the bottom left 
corresponds to the first perception, where the robot knew its pose (by definition it was 
(0, 0, 0)) and only saw a single corner within its circular perceptual field. The other 
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Figure 13.3 The initial map. 


grey regions correspond to points in time where the robot perceived the same sensor 
reading, but was less certain about its position. Hence, the resulting map is less sharp. 


The robot now re-runs the localization, but this time using the initial map. The map, 
while far from accurate, nevertheless leads to improved localization. The new local- 
ization run based on the initial map is shown in Figure 13.4. Comparison with the 
first localization results (Figure 13.2) shows that the margin of uncertainty has shrunk, 
illustrates the positive effect of the map on localization. 


So far, only talked about forward localization, using the data leading up to time t to 
determine the robot's pose at time t. Strictly speaking, even data collected after time 
t carries information about the robot's pose at time t. Thus, some sort of backwards 
localization is called for that uses future data to revise past beliefs. Figure 13.5 shows 
the result of backwards localization. This figure is to be read from right to left. Since 
the robot's final pose is unknown, we start with a uniform distribution as indicated in 
the rightmost diagram in Figure 13.5. One time step earlier, the robot's distribution is 
given by the diagram next to the left. This is identical to global localization, just in 
reverse time order. The fact that this diagram contains a collection of circles indicates 
that after a single measurement followed by a motion command, the robot is probably 
on a fixed distance to one of the (approximately) known corner locations. From right 
to left, the diagrams in Figure 13.5 show the result of backwards localization all the 
way to the initial time step. 


Strictly speaking, our approach requires backwards in all iterations, even the first, but 
in the absence of a map the resulting distributions are uniform, which explains why 
we did not address this issue in the first round of localization. 
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Figure 13.4 These alpha-values are based on odometry (as before) and the initial, highly 
uncertain map. 


ІЕЕ 


Figure 13.5 Backwards localization: The beta-values are computed form right to left, 
effectively localizing the robot backwards in time. 


Figure 13.6 The combined results of forward and backward localization. 


Figure 13.7 The second map. 
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Figure 13.8 The combined results of forward and backward localization. 


Figure 13.9 The fi nal map. 


The true posterior over the robot location at the various points in time is shown in 
Figure 13.6. Each diagram is the (normalized) product of the forward localization and 
the backwards localization results (c.f., Figure 13.4 and 13.5, respectively). 


The reader should notice that the pose estimates, while far from being certain, are an 
improvement over the initial pose estimates when no map is used. The new, improved 
pose estimates are now used to generate a new, improved map, shown in Figure 13.7. 
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Figure 13.10 Test environment and data: (a) The environment possesses 4 indistinguish- 
able landmark locations, labeled A to D. The robot's path is: A-B-C-D-C-B. (b) First data 
set, as measured by robot's odometry. (c) Second data set, which contains one more step 
than the fi rst. 


Figure 13.11 Data set 1, localization after three iterations of EM. 


Figure 13.12 Data set 2, localization after three iterations of EM. 


A comparison with the first map (Figure 13.3) shows that now the robot is more certain 
where the corners in its environment are. 


The process of localization and mapping is iterated. After 10 iterations, the process 
nearly converged for our toy-like example. The resulting pose estimates are shown 
in the bottom row in Figure 13.8; the top and middle row show the result of forward 
and backward localization, respectively. At this point, the uncertain vanished almost 
entirely, even for the intermediate locations. At the last point in time, the robot is now 
certain that the current corner is identical with the first, as illustrated by the diagram in 
the bottom-right corner. Remarkable are the results of backward localization, which 
are spirals in x-y-0 space (and hence circles in the projective x-y space). 
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The corresponding map, shown in Figure 13.9, shows the location of four corners that 
is slightly rotated but otherwise consistent with the true arrangement. The final map 
also contains a spurious fifth corner, which is the result of “over-fitting” at the very 
border of the robot's perceptual range. 


A second example is shown in Figures 13.10 to 13.12. This example illustrates how 
the algorithm effectively uses data to revise beliefs backwards in time. In this example, 
our algorithm first fails to localize the robot correctly, due to lack of data. As more 
data arises, the false beliefs are identified and corrected in favor of the correct ones. 


Figure 13.10a shows the environment: A multi-corridor environment that possesses 
four "significant" places (labeled A to D). For the sake of the illustration, we assume 
that these places are all indistinguishable. The robot's path is A-B-C-D-C-B, that is, 
the robot moves up, turns right, then turns around and move half way back to where 
it started. Of course, the odometry is erroneous. Figures 13.10b&c depict the path, 
as obtained through the robot's odometry. In Figure 13.10b, the situation is depicted 
after the robot revisited location C. One step later, the robot is back at location B, as 
shown in Figure 13.10c. 


The interesting feature of this example is that after returning back to location C (Fig- 
ure 13.10b), the robot's pose is much close to location B. Since our robot actually 
cannot receive any corridors/walls, this suggests that maximum likelihood solution 
places the robot is at B, and not at C; this is a case where a false solution is actually 
more likely than the correct one. Figure 13.11 confirms our suspicion. It shows the 
robot's pose beliefs after three iterations of EM. The interesting diagram is the one on 
the right: Here the belief possesses two modes, one at the correct location and one at 
location B. Since the odometry is closer to B than to C, the likelihood of being at B is 
slightly larger. Thus, if one wanted to know the most likely path, EM would chose the 
one shown in this diagram. 


The picture changes as the last data item is added, in which the robot moved from 
C to B, as shown in Figure 13.10c. This last step is easily explained if the robot is 
at C; starting at B, however, the new observation is less consistent. Applying EM 
to this new data set yields the beliefs shown in Figure 13.12. Here the most likely 
path is actually the correct one, even though the likelihood is only changed by a small 
amount. This example illustrates that EM is capable of using data to revise beliefs 
backwards in time. This is important specifically when mapping cyclic environments. 
Then first closing a cycle, the sensor data often is insufficient to determine the robot's 
pose correctly, requiring some sort of backwards-correction mechanisms that can fix 
errors in maps as new evidence arrives. 
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This completes the examples of EM mapping. We will now formally describe the EM 
mapping algorithm and derive it mathematically. Iterates localization and mapping. 
The forward pose estimates will be called a, the backward estimates 3, and the product 
of both y. Algorithm: maximizes the map likelihood. Pose estimated may not be 
interpreted as posterior, since evidence is used multiple times and hence robot is *over- 
confident." Our illustration also has implications for the choice of the representation. 
Even in this simple example,the intermediate pose estimates cover a wide range of 
distributions. 


13.4 MAPPING WITH THE EM 
ALGORITHM 


13.4.1 The EM Mapping Algorithm 


Armed with the necessary mathematical derivation, we will now state the general EM 
mapping algorithm. Table 13.1 depicts the algorithm EM _mapping(). Here n denotes 
a generic normalizer that ensures that the desired probability distributions integrate 
(sum up) to 1. 


The algorithm EM mapping() is a batch algorithm. Its input is a data set d, from 
which it generates a map m. The algorithm EM mapping is initialized in line 2 by 
assigning uniform distributions to each variable in the map. The main loop, which is 
iterated until convergence, implements the E-step (lines 4 to 11) and the M-step (lines 
12 to 16). The E-step consists of three parts: a forward localization (lines 4 to 6), 
an inverse localization (lines 7 to 9), and a multiplication to obtain the full posterior 
beliefs over all poses (lines 10 and 11). The M-step computes, for each map element 
(2, y), the non-normalizes likelihood that the world's property at (2, y) is l (lines 13 
and 14), and subsequently normalizes these probabilities (line 15). Finally, it compiles 
a single, global map (line 16). After convergence, the final map is returned (line 17). 
Empirically, 3 to 5 iterations often result in a map that is sufficiently close to the 
asymptotic result, so that the computation can be terminated. Notice that when stating 
the algorithm, EM mapping() we omitted the index j. This is legitimate because the 
algorithm is incremental, ant it suffices to memorize the most recent map т; and 
beliefs Bel; ((?). 


Make Figure: The L-shaped example environment, and maploc distributions 
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Algorithm EM mapping(d): 
initialize m by uniform-map 


repeat until satisfi ed 
a = Dirac((0, 0, 0)) 
fort — 1 to T do 
al) = п P(o | s,m) f P(s® | a(t7 D, gt-D) at=) ds® 
BD) = uniform() 
fort = T — 1 down to 0 do 
Bo - nf P(s® | alt), s(t+1)) P(ott+D | зЁ) m) pite» ds (t^) 
fort — 1 to T do 
Bel(s(?) = al BO 
for all (x, y) do 
for alll € L do 


77 T 
Thay) = Zio PO? | sO, m, yy = [). 
A Bel(s)) ds) 


х,у) €range(o(*) ,s(*)) 


TM x,y) =l = (ver (wy) 2) тау) =1 


т = {хуу} 


return m 


Table 13.1 The general EM mapping algorithm. 


Figure ?? shows an illustrative example of the algorithm ЕМ mapping() in action. 
Here an imaginary robot, equipped with a sensor that produces noisy measurements 
of range and bearing to nearby landmarks, moves through an L-shaped environment. 
The environment possesses four landmarks. To make things interesting, let us assume 
the landmarks are indistinguishable to the robot, and the robot does not know how 
many there are. Figure ?? shows the robot's true trajectory, along with the measured 
one. Obviously, there is odometry error. When returning the landmark labeled 4, 
odometry suggests that the next landmark is closes to the one labeled 2; in reality 
however, the robot encountered landmark 3, and then moved back to landmark 2. 


[[[Insert picture, complete the story]]] 
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Illustrates the importance of uncertainty. Also demonstrates that the robot can revise 
past beliefs as more evidence arrives. This is essential for "closing the loop." 


13.4.2 The Map Likelihood Function 


We begin our formal derivation with a statement of the likelihood function, whose 
maximization is our goal. As stated in the introduction of this chapter, our approach 
maximizes 


P(m | d), (13.14) 


the likelihood of the map given the data. Our first exercise will be to express P(m | d) 
in terms of familiar conditional probabilities, in particular the perceptual model P(o | 
s,m) and the motion model P(s’ | a, s). In particular, we will find that 


P(m | d) 


T T-1 
= mP(m) I e [ПР | m, 809) П P(s@ | a, 500) ds, ... (303) 
t=1 


t=1 


where 77 is a numerical constant: a normalizer that ensures that the left-hand side inte- 
grates to 1 over all m. Since we are only interested in maximization, not in computing 
the actual likelihood of a map, 77 is irrelevant. 


The term P(m) in (13.15) is the prior on maps. It allows one to express an a priori 
preference for certain maps over others. For example, if the environment is known 
to consist of parallel and orthogonal walls only—a commonly made assumption that 
can greatly simplify the mapping problem—one might exclude maps that do not obey 
that assumption. Throughout this text, however, we will consider the most general 
case and assume that P(m) is uniformly distributed. Hence, P(m) is a constant and 
can be ignored in the maximization, just like 7. With these considerations, (13.15) 
shows that, at least in principle, finding the most likelihood can be done solely from 
the perceptual model P(o | s, m), the motion model P(s' | a, s), and the data d. 


To see that (13.15) is correct, let us begin by applying Bayes rule to P(m | d): 


P(m|d) = NC (13.16) 
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Obviously, the denominator P(d) is a constant factor that does not depend on m. To 
simplify the notation, we prefer to write 


P(m|d) = m P(m) P(d | m) (13.17) 


where 77; is a normalizer that ensures consistency: 


m = bs P(d | m) Pim) (13.18) 


The rightmost term in Expression (13.17), P(d | m), can be rewritten using the The- 
orem of Total Probability: 


P(d|m) = Jef eama) 
P(s,...,8 | m) ds©,...,ds™ (13.19) 


Here we benefit from our careful definition of the motion model in Chapter ??, which 
ensures that the rightmost term, P(s(9,..., 50? | m), is a constant that does not 
depend on m. Hence, we obtain 


P(d|m) = m [ f Pa m.s... s) dO, ast (13.20) 


where 72 is another normalizer. Further below, we will subsume 7; and 72 into a 
single constant, but for now let's keep them separate. 


The key observation in our derivation is that given the map m and all T poses 
s0)... s(T), the Markov assumption renders all observations and motion commands 
conditionally independent. In other words, P(d | m, s(9, . . . , s(7?) can be re-written 
as a product of “local” probabilities, conditioned on the appropriate state information: 


T T—1 
Palms is) = [[PO® | s,m) T [| Pa |, а аа) 


t=0 t=0 
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The last term, P(a(? | (+1), s,m), is simplified by first noticing that a) does not 
depend at all on the map m, hence 


P(a® | s,s m) = Р(а |80, 5) (13.22) 
and then further applying Bayes rule 


P(st*) | а®,в®) P(a(? | (0) 
P(s@+)) | s(t)) 


Р(а® | s+, 80) = (13.23) 


Again, the denominator P(s‘'+ | s(?) is a constant and so is the distribution P(a® | 
800), since in the absence of any other information a single pose does not convey 
information about subsequent poses or motion commands. Hence, we can re-write 
(13.23) using the normalizer ng: 


P(a® | s,s, m) = тз P(s | a, 5) (13.24) 


Substituting (13.24) into (13.21) yields 


T T-1 
P(d|m,s,...,8) = mJ Plo | s,m) [[ Pls | а 8099) 
t=0 t=0 


Notice that we have managed to express the desired probability in terms of familiar 


expressions: The perceptual model P(o | s,m) and the motion model P(s(^*? | 
(0, 5) 
а, 800). 


We now substitute (13.25) into (13.20) and obtain 


P(d|m) = (13.26) 
T T—1 

n m f- f Iro? | s,m) [[ P(st**? | a, s) as, .... ав) 
t=0 t=0 


and back into (13.17): 


P(m|d) = (13.27) 
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q T-1 
" Pom) f [ [TP | s,m) [T Pls | a, 59) as... as? 
t=0 t=0 


Here we collapsed all normalizers into a single one (7) = 717214), which yields the 
desired expression (13.15). 


13.4.3 Efficient Maximum Likelihood 
Estimation 


Having successfully derived the formula for map likelihood the next logical question 
is how to actually maximize it, thus, how to find the most likely map. 


A simple-minded approach, which does not work, would be a generate-and-test ap- 
proach: 


Algorithm generate and test(d): 


for all maps m do 
compute non-normalized likelihood p x P(m | d) 
if p larger than that of any previous map then 
m* =m 
return m* 


Unfortunately, this algorithm fails for practical reasons. If each discrete map has 104 
binary elements, which is much less than the number of grid cells for some of the maps 
shown further below, then there will be 210-000 different maps. Evaluating a single one 
of them requires summation often over as 10^ nested integrals or more (T = 104 data 
items may be easily collected in a few minutes), each of which can take another 109 
values or so—this is clearly infeasible. The fact that the likelihood function is highly 
non-linear suggests that no practical algorithm exists that may ever be able to find it. 
We are certainly not aware of one. 


The EM algorithm, which will be described in turn, implements hill-climbing in like- 
lihood space. As noticed in the introduction, EM is an iterative algorithm. Let 
j — 0,1,... denote the iteration counter. For simplification of the presentation, let 
us assume that rn. | denotes the empty map, by which we mean a map where each 
T (s...) is uniformly distributed. EM iterates two steps: 
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1. Localization (E-step): The E-step computes the belief distribution over poses 
(0) (T) 
S5....,8 


Belg(s?) = P(s | mg- d) Vt—1,..,T (13.28) 


conditioned on the map т; and the data d. 


2. Mapping (M-step): The M-step calculates the most likely map 
mij = argmax Р(т | Bely (s®),..., Belg (s), а) (13.29) 


f JE | s,...,8, q) 
Belj (s)... Belgj(s(D) ds(9,..., ds? (13.30) 


In the initial E-step, m is the empty map and the E-step calculates 
Belg(s?) = P(s |а) Vt=1,...,T (13.31) 


Since the most likely map can be difficult to compute even if the poses are known, 
further below we will introduce an additional independence assumption that allows us 
to decouple the computation in an efficient way. During the optimization, we will also 
consider probabilistic maps that contain a notion of uncertainty. 


13.4.4 The E-step 


Expression (13.28) is similar to localization (Chapter 7), but not identical. This is 
because for t < T, Beli (s?) depends also on data collected at times t’ > t, i.e., 
"future" data (from the perspective of time t). With appropriate assumptions, however, 
we will show that Beli (s(?) can be expressed as the normalized product of two 
terms: 


Belg(s?) = m P(s® | mj 4,a9-9) P(s' | mg- d) (13.32) 


This is convenient, as each of the terms on the right-hand side can be computed inde- 
pendently using Markov localization, as explained in the previous chapter. 
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To see, we recall 


d = 49-9 udn (13.33) 
with 

dO) = a of)... aD, 9 (0 (13.34) 

dt-D = g® ot). gf) 0) (13.35) 


and note that 


Beli (s®) = P(s® | my—1,4) (13.36) 
= P(s® | my- doP, qt) (13.37) 


Bayes rule gives us 


P(q(9-.0 1 gt). dt D) P(g® Lon gis) 

Beljj(s0) = ( ыш ын [ти 
2 P(d©.--t) | mp-ap dit D) 

= ns P(d(9-9 | mi, 60, de?) P(s© | mij 47, 063939) 


ns P(d9-? | mj 4), 8?) P(s | mg, d*- D). (13.40) 


Il 


where ns is a normalizing constant. The last transformation exploits the Markov as- 
sumption: knowledge of th map m;..1; and the pose s() at time t renders past data 
409-0 and future data 107) independent. We will now apply Bayes rule again to the 
second term in (13.40), namely P(d(9—) | mi; .4j, s(?), from which we obtain 


P(s( | mj], d?) Р(40--0 | тр 
P(s® | mp) 
ne P(s | my—y,d©-) Pid? | mG1392) 


Р(а9-0 | my- s?) _ др 


І 


Here ne is yet another constant. Substituting this back into (13.40) yields 


Bely(s®) = тт P(s | туу 1,000) (13.43) 
P(d9—9 | тру) P(s® | my- aD) (13.44) 
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Since P(d©-) | mj-1j) does not depend on s), it can be replaced by another 


constant, which gives us 
Belg(s?) = m P(s | mg a, d 9-9) P(s® | my 4, dt T) (13.45) 


Here ny is a constant that subsumes 74, ns, and P(d©--*) | mi 1]). 


Henceforth, we will call 


ab. = P(s(9 | miei d) (13.46) 
Bp = PEO made) (13.47) 


which implies 


Belg(s?) = mot) 80) (13.48) 


The reader may notice that the computation of af) is analogous to forward- 


localization as described in the previous chapter. The computation of a can be 
viewed as “inverse localization,’ where data collected after time t is used to retroac- 
tively estimate the position at time t. The 6-values add additional knowledge to the 
robot's position, typically not captured in localization. They are, however, essential 
for revising past belief based on sensor data that was received later in time, which is a 
necessary prerequisite of building large-scale maps. 


Computation of the o-values 

The computation of the a-values is equivalent to the probabilistic localization algo- 
rithm described in the previous chapter with known initial pose s(? = (0,0,0). For 
completeness, it is briefly restated here. 

Initially, the robot is assumed to be at the center of the global reference frame. Thus, 


а? (for any j = 0, 1,...) is initialized by a Dirac distribution centered at (0, 0, 0): 


o) = P(e | my_y,d-) (13.49) 
= P(s) (13.50) 
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Е | 1, if s = (0,0,0) (13.51) 


0, if s? 52 (0,0,0) 


АП other ot with t = 1,..., Т are computed recursively by Bayes filters (c.f. Chap- 
ter 22): 


of) = P(s?|my 4j, d9-9) (13.52) 
= P(s® | mj 45,090,907, q(9.:70) (13.53) 
= т P(o(? | 8) тууда D, q(9 170) P(s(0 | т-у; 07D, 4(%15.34) 
= me P(o | 0, mg 4j) P(s | туа 9, q9--t70) (13.55) 


where ne is again a probabilistic normalizer, and the rightmost term of (13.55) can be 
transformed to 


P(s® | my—1,a-Y, q(917U) (13.56) 


Il 


II 


Jae | Ym ya, 09-0) P(s(t7D | тур, a7, dO" Ny as 


pre | a&7 D, 8-0) p(s*79 | mz 44, d(9-170) ds (13.58) 


2 [Pe | a=), st) aD galt (13.59) 


Substituting (13.59) into (13.55) yields the recursive rule for the computation of all 
a) with boundary condition (13.51): 


(0) є 1, if s© = (0,0,0) 
a ш Ls if s 52 (0,0,0) (13.60) 
of) = m POP |909, ma) f PEO | atd, 5679) aff a361 


for allt = 1,..., T. Notice that (13.60) and (13.61) can be computed efficiently from 
the data d, the model m;..,;, using the motion model P(s’ | a, s) and the perceptual 
model P(s | o, m). 


Computation of the G-values 
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The computation of 6-values is analogous, to that of the a-values, with the exception 
that it takes place backwards in time and that the boundary conditions differ. The 
values 8 (T) do not depend on any data. Hence, BE which expresses the probability 
that the robot's final position is s, is uniformly distributed: 


B — uniform (13.62) 


All other -values are computed recursively by applying Bayes filters backwards in 
time: 


8p == P(e | mug de) (13.63) 


T Jan |) mi aj dT) Р(8®@#) | mya d^) tél) 


= [Pe | al), sD) P(s(t*D | myy, det, of+D) 885) 


By now, the reader should recognize that the first transformation that led to (13.64) is 
a result of applying the Theorem of Total Probability, and the second transformation 
exploits the Markov assumption with some terms reordered. 


Following the same rationale as in the derivation of the a-values, we will now trans- 
form the last term in (13.65) using Bayes rule and then exploiting the Markov assump- 
tion, to obtain 


P(s(**9 | mij 44, 61D), of) (13.66) 
Я P(ott+) | al) gaia Р) Р(8%+1) | тр). " 
B PED [mp de1-77) re 
= m P(ott) | в D mya qe) P(s@+)) | my —1), d+" (13)68) 


nz P(o**9 | oY), mi цу Bt (13.69) 


Il 


Substituting (13.69) into (13.65) gives us, together with the boundary condition 
(13.62): 


Bp" — uniform (13.70) 


B m [ Pi | a®, SC FO Р(о%+® | sD, mga) ap? 45871) 


I 
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Thus, the @ values are best computed backwards in time, starting at t = Т and back 
to T' = 0. For practical purposes, one might want to assume that the motion model is 
symmetric, i.e., P(s' | a, s) = P(s | a, s^). 


First E-step 


In the first computation of the E-step (j = 0), where no map is available, P(o | 
s, Mj—1]) is assumed to be distributed uniformly. This is equivalent to ignoring all ob- 
servations {o0 об), е 00} in the computation of the œ- and (-values. The result- 
ing position estimates are only based on the motion commands (a (9), a)... a78} 
in the data d. This completes the derivation of the E-step. 


13.4.5 The M-step 


Recall that the goal of the M-step is to compute the most likely map from the pose 
beliefs Belj;)(s‘7), d): 


my) = argmax Р(т | Belj; (s(9, ... ‚Ве! (8072), а) (13.72) 


To facilitate the generation of the map, ме decompose ће mapping problem into a 
collection of local problems, that is, we independently generate local maximum like- 
lihood maps for each (x, y)-location 


gj = argmax Р(тү„ уу | Belgy(s9),..., Bely (s), а) (13.73) 


T (x,y) 


™ (ay), Й 


and construct the map т; by pasting together these pieces: 
mp = U mest (13.74) 
(x,y) 


Strictly speaking, for some sensor types the resulting map does not maximize likeli- 
hood. For example, a sonar reading of length o can be explained by a single obsta- 
cle within its cone; hence, different (x, y)-locations cannot be mapped independently. 
However, the independent treatment of different locations is pervasive in the literature, 
as it transforms the high-dimensional mapping problem (often with more than 10^ di- 
mensions) into an equal number of independent one-dimensional mapping problems, 
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which can be solved efficiently. Finding computationally feasible solutions for (13.72) 
that circumvent this decomposition is an open problem with combinatorial character. 


The locality of our decomposition makes it straightforward to compute the maxi- 
mum likelihood expression (13.73). Recall that | € L denotes a property (e.g., oc- 
cupied, unoccupied), and L denotes the set of all properties that the world can have at 
any location. The maximum likelihood map т, у), [;; under fixed position estimates 
Belyj(s),..., Beli; (s/'? is computed according to the weighted likelihood ratio 


тл = Plmey =l |d, Beli (s), ‘tis Belj;(s) 


Expected # of times l was observed at (x, y) 
( 


І à (13.75) 
Expected # of times anything was observed at (2, y) 


The expectation is taken with respect to the beliefs Beli; (s) to Beli; (s). Ex- 
pression (13.75) follows a frequentist approach, which equates the likelihood that lo- 
cation (2, y} has property l with the empirical frequency, weighted appropriately. Intu- 
itively, we just “count” how often the robot saw / at (x, y) and divide it by the number 
of times it saw anything at (z, y). This ratio maximizes the likelihood of the data. 


To make this more precise, let us now transform (13.75) into a mathematical expres- 
sion. The enumerator of (13.75) is given by 


T 
b | P(mqs,yy = 1| o9, 59) Ii, „уєтапве(о® (0) Belty(s) ds (13.76) 
t=0 


This expression cumulates, with the appropriate weighting, the information we have 
about the map at (x, y). 


The term P(m(,,, = l | o (0, (0) specifies the probability that the property of the 
map at (2, y) is l, judging from the t-th sensor measurement o(? and under the as- 
sumption that the robot's pose was s(?. Of course, o? might not carry any informa- 
tion about M+ y). This is the case, for example, if the distance between (2, у) and 
s(! exceeds the robot’s perceptual range, or if the sensor measurement is blocked by 
an obstacle in-between. Hence, the probability is multiplied by an indicator variable 
I(s.y)erange(o(0,5(0). Recall that the function range(o, s) returns, the set of coordi- 
nates covered by the sensor measurement o taken at s. The indicator variable J checks 
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if the sensor measurement carries any information on the map at (z, y): 


1, if (x,y) € range(o, s 
(x,y) erange(o,s) { ( y) 5 ( (13.77) 


0, if (х,у) ¢ range(o, в) 


Finally, expression (13.76) sums over all time steps t = 1,..., T and over all positions 
в®, weighted by their likelihood Belj; (s(?). 


For the denominator of (13.75), we have sum (13.76) for all | € L, which yields: 


3 Y [Pim fey) =U | 09,50) i yyerange(o(0,5(0) Bells (в ©) ds (0(13.78) 


VEL t=0 
T 

т 39) (x P(my) =! | oam) 1ж,ууєтапве(о(® 0) Beli; (s^Q ag) 
t=0 VEL 

= x МЕ (e,y) €range(o(9 s) Belg (s?) ds (13.80) 


since, trivially, $` yez P(I' | -) = 1. Dividing (13.76) by (13.80) yields the likelihood 
ratio 


T 
» Рота =! | 99, s®) Liz yjErange(ott) ,в@)) Ве (809) ds) 
t=0 

Tsai] = Е (13-81 


Y^ ei Belg (s?) ds 
t=0 


To complete the derivation, we need one last step. The enumerator of (13.81) is a 
function of a conditional density P(m(;. | o, s). This density can be interpreted as 
an "inverse" perceptual model, since it determines the state of the world from a sensor 
measurement and a robot's pose. Inverse perceptual models will be addressed in depth 
elsewhere in this book. For the time being, it suffices to notice that our decomposition 
into small, local maps allows us to conveniently write 


P(o,| s, Miz yy) P(Miz y | 8) 
P(my) |0,8) = BG Eos l v) (13.82) 
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Both P(m(,. | s) and P(o, | т, уу) can safely be approximated by a constant ns, 
so that we have 


P(mz,y) 10,8) = ms Р(о,| s, mq) (13.83) 


Notice that this expression is a version of the familiar perceptual model P(o, | s, m). 


Substituting (13.83) into (13.81) gives us the final equation that is the M-step: 


TI (a) —] = P(mi(,y = [ | d, Beli (s), nn Belj (s?) 


T 
У Ре | 89 mq у >= 1) Tis yeraugetot ау Belj (s?) ds“) 
= 43.84) 


T 
35 / Tenera go Belg (59) ds 
t=0 


This completes the derivation of the M-step. While Equation (13.84) looks complex, 
it basically amounts to a relatively straightforward frequentist approach. It counts how 
often property l was observed for location (x, y), divided by the number anything was 
observed for that location. Each count is weighted by the probability that the robot 
was at a location s where it could observe something about (x, y). Frequency counts 
are maximum likelihood estimators. Thus, the M-step determines the most likely map 
at (2, y) from the position estimates computed in the E-step. By alternating both steps, 
the E-step and the M-step, both the localization estimates and the map are gradually 
improved. 


13.4.6 Examples 


Example: 


13.5 GRID-BASED IMPLEMENTATION 


So what is there to know about implementing the algorithm EM _mapping()? In prin- 
ciple, the algorithm can be married to any of the representations discussed in Chapter 
2.5. As in the previous chapter, let us first consider the possibility of implementing 
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the algorithm EM mapping() using grids. This involves approximating both Both 
the beliefs Bel(s) and the map M using equally-spaced, metric grids. Consequently, 
integrals in Table 13.1 are replaced by (finite) sums over the corresponding grid cells. 


Under certain conditions, grids are well-suited to implemented the algorithm 
EM mappingO. Due to the inefficiency of grids, however can be extremely slow 
and memory-intensive. It can be sped up significantly by various modifications. 


= Data sub-sampling. Instead of computing a pose belief Bel(s(?) for each point 
in time £ € (0,..., T}, in practice it often suffices to estimate the pose for a 
small subset thereof. This is because short-term control/odometry errors are usu- 
ally small and therefore can be corrected by other and faster means. A simple 
example is to estimate beliefs Bel(s) only after the robot progressed a certain 
distance (e.g., 10 meters), assuming that in between odometry is error-free. Al- 
ternatively, one might correct for in-between odometry errors by fast, linear in- 
terpolation. Data sub-sampling clearly alters the output of the algorithm, but it 
may be necessary given today's computing constraints. 


m Selective updating. As described in Chapter ??, selectively updating the prob- 
ability distributions can greatly speed up the algorithm. Recall that in selec- 
tive updating, only those grid cells are considered whose probability Bel(s) is 
larger than a threshold (e.g., 0.001 - max, Bel(s)). With careful adjustment of 
the threshold, selective updating only minimally alters the result while speeding 
up computation by orders of magnitude. 


m Selective memorization. Hand in hand with selective updating, the memory re- 
quirements can be lowered by selectively memorizing only a subset of all values 
in Bel(s). A straightforward approach is to identify the smallest axes-parallel 
hypercube in the Grid that contains all cells whose likelihood exceeds the thresh- 
old used in selective updating. Alternatively, one can maintain an explicit list of 
such cells, and only store those whose probability exceeds the threshold. 


= Caching. Certain quantities, such as the motion model P(s’ | а, s), have to 
computed in every iteration of the algorithm. To speed up the computation, one 
may pre-compute P(s' | a, s) for all a“) in the data set d and memorize this in a 
set of look-up tables. 


m Exploiting symmetry. The cached densities P(s’ | a, s) are highly symmetric. 
For a fixed a, P(s | a, s") might appear to be six-dimensional: three dimensions 
for s = (2, y, 0) and three dimensions for s’ = (2, y', 0"). In practice, however, 
it suffices to memorize this density for a single value of s: s — (0,0,0). This is 
because 


P(s|a,s = P((x,y',0)]|a,(,y,0)) (13.85) 
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= P((#,9,) | a, (0,0,0)) (13.86) 
with 
& = (x'—z)cos0 + (у — y)sin0 (13.87) 
g = (у —y)cos0 — (z' — x)sino (13.88) 
6 6’ —0 (13.89) 


Obviously, the resulting density P((2,9,0) | а, (0,0,0)) is only three- 
dimensional. Further savings can be achieved exploiting symmetries in the mo- 
tion model. 


Empirically, the effect of these modifications will be gigantic! In one experiment, 
we found that they lowered memory requirements by a factor of more than 108 
when compared to a simple-minded grid-based implementation of the algorithm 
EM _mapping(), with computational savings at a similar rate. So we recommend you 
try them out! 


Make Figure: Landmark-based maps from Washington, Wean Hall. 
Figure ?? shows examples... 


Even though we chose landmarks as example, the reader should notice that our algo- 
rithm EM mapping() is generally applicable to any type of sensors. Further below. 
in Chapter ??, we will present a modified version of this algorithm which has been 
successfully applied to build maps with one of the most widely used sensors in mobile 
robotics: sonars. 


13.6 LAYERED EM MAPPING 


A pivotal problem with the algorithm EM mapping(0, as presented in Table 13.1 on 
page 366, is the fact for some sensors it may generate highly implausible maps during 
search. Technically, the source of the problem lies in the M-step, and in particular 
in the decomposition that estimates maximum likelihood map for each location (2, y) 
independently (c.f., Chapter 13.4.5). 


Make Figure: Counterexample from Wolfram’s ICML paper 
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[[[Actually, the 2-corridor map is a bad example, since it's based on occupancy grid 
maps, not the maximum likelihood estimate. Maybe we can generate a new counterex- 
ample]]] 


To give a concrete example, suppose we apply EM mapping() to a robot equipped 
with range finders (e.g., sonars), in the hope of building a consistent map of an office 
environment. If the robot's poses are known, the mapping algorithm might produce 
maps like the occupancy grid segment shown in Figure ??. Clearly, this is a nice and 
crisp map of a corridor segment. The problem arises when the pose of the robot is not 
known well, as is typically the case in the M-step of EM mappingQ. For example, if 
Bel(s) contains two distinct regions with high likelihood, the M-step might generate 
a map like the one shown in Figure ??. It is worth noting that for each individual 
location (z, y), the map in Figure ?? correctly estimates the likelihood of occupancy. 
Nevertheless, the map as a whole misleads the E-step (localization): In some areas, 
it shows three walls instead of two, and in others the corridor appears to be twice as 
wide. Configurations like these can lead to erroneous results for the localization in the 
E-step, if the range-based localization algorithm described in Chapter 7 is applied. 


Fortunately, the basic algorithm can be augmented in a way that reduces the damag- 
ing effects. We will now describe a layered version of the algorithm EM тарріпо(), 
called layered EM mapping(). Technically, this algorithm does not fully sidestep the 
independence assumption, but it keeps track of certain dependencies, thereby remedy- 
ing the problem in practice. 


13.6.1 Layered Map Representations 
Make Figure: ICML Figure 5 


The key idea of the algorithm layered EM mapping() is to represent the map by 
a collection of small, local maps, instead of a monolithic, global map. Each local 
map is generated from a short data sub-sequence. Figure ?? shows some examples, 
where each local map is an occupancy grid generated from not more than 10 meters 
of robot motion. To establish correspondence between the coordinates systems of 
the local maps, and the global coordinate frame, each local map is annotated by the 
corresponding transformation. We will refer to this new representation as layered, to 
indicate that the global map is composed of a collection of small, local maps. 


Layered maps can be learned with EM. However, our algorithm has to be modified 
appropriately to account for the fact that maps are now layered. EM with layered 
maps is graphically illustrated in Figure ??: Instead of moving back and forth between 
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pose estimates and a flat map as was the case in EM _mapping(), the new algorithm 
layered EM mapping?) first generated a set of local maps, and uses EM to estimate 
their location and orientation within the global coordinate frame. As we will see, 
the advantage of the layered approach is that within the local maps, dependencies 
between different (x, y) locations can be treated correctly. After the termination of 
EM, the local maps are fused into a single, global map. This approach inherits its 
name from the fact that two different map representations are used, a flat global map 
and a collection of local maps. 


13.6.2 Local Maps 


Local maps are occupancy grid maps acquired over a motion segment of bounded 
length. Purely to facilitate the notation, let us temporarily assume that the local map 
is generated from a single sensor measurement o(?. This allows us to write 


тб) (13.90) 


for the local map generated from the sensor measurement o). The reader may notice 
that further below, after posing the main algorithm, we will integrate longer data se- 
quences into a local map, which will reduce the computation and lead to more distinct 
local maps. However, for now we will assume a one-to-one correspondence between 
time steps t and local maps. 


Each local map carries its own, local coordinate system. To map local map coordi- 
nates back into the global coordinate frame(and thereby establishing correspondence 
between the coordinate frames of different local maps), we have to annotate them by 
а coordinate transformation. Let us write 


NO (13.91) 


for the coordinate transformation of the local map m ?. Just like robot poses, map 
poses are specified by three coordinates, с = (2, y, 0). Occasionally, we will refer to 
a? as the pose of the t-th local map (poses of maps are essentially the same as robot 
poses). 
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In the vein of this book, our approach memorizes a belief distribution over c ? with 
each map. This distribution will be denoted 


Bel(o™) (13.92) 


and is similar to the pose belief Bel(s(?) 


As we will see below, by representing maps in two layers we eliminate the prob- 
lems arising from using a single, monolithic map. This is because local maps are not 
convolved with their pose distribution; instead, the pose distribution is represented 
explicitly. 


13.6.3 The Perceptual Model For Layered 
Maps 


Our layered map representation requires a new definition of the perceptual model 
P(o | m,s). Our approach assumes that given a percept o taken at location s, any 
two local maps are conditionally independent: 


P(m( m lo,s) = P(m™ |о, ѕ) P(m©) | 0, 8) (13.93) 


for all t 52 t/. This assumption would be correct of none of the local maps ever 
overlapped. In regions of overlap it is incorrect, and might in turn lead to overly 
confident maps. Nevertheless, this assumption is essential to keep the computation 
manageable, and in practice it seems to work well. 


The independence assumption allows us to extend our perceptual model to layered 
maps. Let us first apply Bayes rule to the familiar perceptual model: 


P(o|s,m) = песе ч (13.94) 


Notice that neither o nor m depend on s, hence 


P(o|s,m) = m (13.95) 
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Under the assumption that—in the absence of any other information—all observations 
and all maps are equally likely, P(o) and P(m) can both be subsumed into a constant 
factor, called 77: 


P(o|s,m) =  P(m|o,s) (13.96) 


Observing that m = {(m),...,m 7}, and by virtue of our independence assump- 
tion (13.93), we obtain the nice product form: 


P(m|o,s) = n P(m®,...,m™ | о, в) (13.97) 


P(m™ | o, s) (13.98) 


ll 
ишш [с 


t=1 


We will now expand using Total Probability and factoring out a collection of indepen- 
dences: 


У 


T 
P(m(? |0,5) = m Про | о,в,о®) Р | 0,8) da? (13.99) 
t=1 


t=1 


T 
= 7 MEL | 0,s,0) P(o) do (13.100) 
t=1 


Bayes rule applied to the first term leads to 


P(o | s,m ,a0) P(m® | s,0™) 


P(m(? | о,в,о\®) Р(о | ѕ,009) 


(13.101) 


where, by the same logic as above, P(m(? | s,a) апа P(o | s,o(?) can be sub- 
sumed into a (different) constant, which gives us: 


T 
P(o|s,m) = m JI f Ре | s, mO, o (9) P(e) do (13.102) 
t=1 


Here 7; is again a normalizer, but it is different from the 7 above. For convenience, 
we will replace the terms P(o) by the beliefs Bel(a(?) obtained while learning the 
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map, which leads to: 
T 
P(o|s,m) = m IL f Ре | s,m, o (9) Bel(o™) do (13.103) 
t=1 


The term P(o | s,m,o™) is computed using our familiar perceptual model, dis- 
cussed in Chapter ??: 


P(o|s,m®,0) = P(o|a(9,m(?) (13.104) 


where a) = (2, (0), 6) is given by the appropriate coordinate transformation: 


20 = xcosd+ysind— 2” (13.105) 
y? = ycos@—-«sing — 2) (13.106) 
6) = 0-00 (13.107) 


Here we assumed that s = (2, y, 0) and o? = (2 ,y 000). 


Putting everything together, we obtain the perceptual model for layered maps: 
T 
P(o|s,m) = m IL / Ре | ma) Bel(o) do (13.108) 
t=1 


The perceptual model (13.108) has some interesting properties. The probability of a 
sensor measurement o is computed by “localizing” it in each of the local maps (con- 
volved by their pose beliefs), and them multiplying the results for all local maps. 
Thus, the sensor model is essentially the product of the familiar model defined over 
flat, monolithic maps. In practice, maps which cannot overlap with the robot’s posi- 
tion can be safely eliminated, thereby speeding up the computation while not affecting 
the final result. 


13.6.4 EM with Layered Maps 


The E-Step 
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With the appropriate perceptual model in place, the E-step is analogous to the flat 
mapping case. We recall from (13.32) and (13.48) that Beli; (с ©) can be factorized 
conveniently (7 is the iteration index) into 


Bel (o) = m P(e | mg, d9-9) P(e | тру, d D)(13.109) 
——ÁÓ—— 
a ва 
for all t£ = 1,...,Т. Using our new perceptual model, which requires us to substitute 


P(o | s, m) by П 5 P(o | sm), )) Beljj 4j (c?) in the original equations 


for computing the the distributions of 2 and 81), (13.60) and (13.61), we obtain: 


(0) . f 1, if s© = (0,0,0) 
"on m ds if s £ (0,0,0) MSS) 
T: 
af) = 7 П POY | 80, т0, 6) Bely-y(s®) 
t'—1 
free Па) с ds (13.111) 
and 
Bey = uniform (13.112) 
T 
© — (t) | at) (+20) 
В = m | Ps |49, 8) ||] 
t'—1 
P(o(**D | sD (0, 57) Bel(s)(s) gura ds (93.113) 
for all t = 1,...,T. 


The M-Step with Deterministic Annealing 


The M-step calculates the most likely poses of the local maps. In the most generic 
setting, the M-step calculates 


s = argmax Beli (s(?) (13.114) 
( 


g(t) 
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= argmaxa (t^? gt) (13.115) 
s(t) 
for all t£ = 1,...,Т. Such an approach would be the natural extension of plain EM to 


layered maps. 


Unfortunately, this approach is problematic in practice. EM is a hill climbing method, 
which only converges to local maxima. If the odometric error is large, the initial map 
will be erroneous, and subsequent iterations of EM might not be able to recover. 


The danger of getting stuck in a local maximum can be reduced significantly by a 
modified M-step. Instead of keeping track of the most likely pose of each map, our 
approach generates a distribution over poses that slowly converges to the most likely 
one. Thus, our approach generates a “soft” version of the maximum likelihood esti- 
mate. Over time, it gradually reduces the softness, until it finally generates the maxi- 
mum likelihood pose. This approach is known as deterministic annealing. 


In detail, the M-step generates a distribution over poses, denoted и: 


1 
Belg(s?) = m (at ge)? (13.116) 


Here ņ is a (different) normalizer that ensures that the probabilities integrate to 1. The 
parameter o is a control parameter in (0, 1] which, in analogy to the rich literature 
on annealing, will be referred to as temperature. When с = 1, the full distribution 
over all poses is retained. When o = 0, our approach is equivalent to the maximum 
likelihood assignment (13.115). 


The temperature c is slowly driven to zero—a process often referred to as cooling. In 
our approach, ø is initialized with 1 and annealed towards zero using an exponential 
cooling schedule. The effect annealing is to avoid early commitment to a single map. 
Instead, one can think of our approach as moving from density estimation (over the 
pose parameters in the map) to maximum likelihood estimation. The reader should 
notice that our approach is not the first to employ annealing to avoid local maxima in 
EM [14, 34]. 


Post-Processing 
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To come up with a single global map of the environment, the local maps m ? are 
integrated based on their final maximum likelihood poses 


argmax Bel(s(?) = argmax а) pe) (13.117) 


s(t) s(t) 


The local maps, along with their poses, are fed into the algorithm occupancy grid() 
which then produces a single, global map. 


13.6.5 The Layered EM Mapping Algorithm 


Figure 13.2 shows the resulting algorithm, called layered EM mapping(). 


Make Figure: Show ICML figures 


13.6.6 Examples 


More specifically, the data is separated into a stream of sub-datasets denoted 


d = Val! (13.118) 


during each of which the robot does not advance more than a pre-specified distance 
(e.g., 5 meters). The superscript is not to be confused with the superscript (? or the 
subscript г; in EM. 


Let us denote the local maps 


тій (13.119) 


They are built under ће assumption that the odometry error in each data segment dl 
is small enough to be neglected; thus, in practice one has to adjust the size of each dl 


in accordance to the quality of the robot's odometry. 


i 
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Algorithm layered EM mapping(d): 


initialize m by uniform-map 

с = 1.0 

fort = 0 to T do 
m) = occupancy grid(o(?) 
Bel(s(?) = uniform() 


repeat until satisfi ed 
a = Dirac((0, 0, 0)) 
fort — 1 to T do 
a) = [ee P(o® | s(9, тб), $60) Bel(s(t’)) 
. f P(s® | а%—,в%-—1)) at) ds) 
BT) = uniform() 
fort = T — 1 down to 0 do 
BO = [ P(s | a, s¢+D) {Ж 
Р(о%+1) | s+) mt), s) Bel(s*?) BED) 18%+1) 
fort — 1 to T do 
Д5, 
Bel(s(?) — n (a BO) =. 


с = 9a 


m = (m), argmax, Bel(s)),m™,...,m 7, argmax, Bel(s(?)) 


m = occupancy grid({m), argmax, o) Bel(s(), m, ..., 
m, argmax,r) Bel(s‘7))}) 


return m 


Table 13.2 The layered EM map learning algorithm. 


13.7 SUMMARY 


The EM algorithm builds maps by maximizing likelihood. It has been shown 
to build large-scale metric maps, thereby resolve ambiguities and exploiting all 
information in the data. 
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13.8 BIBLIOGRAPHICAL REMARKS 


The problem of map acquisition has attracted a lot of attention in the last decade. 
The literature commonly distinguishes two types of maps, robot-centered and world- 
centered maps. 


=  Robot-centered maps memorize the sensor measurements a robot is expected 
receive at each location (or pose). In the general case, such maps are three- 
dimensional, parameterized by the robot's three-dimensional pose. In certain 
cases, however, two dimensions are sufficient. Example include robots whose 
sensor measurements are rotationally invariant (e.g., a "general brightness" sen- 
sor), or robots for which a sensor measurement taken with a specific heading 
direction 0 is sufficient to compute sensor measurements under arbitrary heading 
directions 0 (e.g., omni-cams, which cover a 360 degree field of view). 


=  Incontrast, world-centered maps memorize the location of surrounding objects 
in world coordinates. For robots equipped with sensors that operate in a two- 
dimensional plane, such as most robots equipped with range finders, it suffices 
to memorize two-dimensional maps. Other sensors, such as cameras, typically 
require three-dimensional maps. 


The advantage of robot-centered maps lies in the fact that they are almost universally 
applicable, no matter what sensors being used. Unlike world-centered maps, robot- 
centered maps do not require that correspondence is established between sensor input 
and the location of objects in the outside world. However, this advantage comes at 
a price. World-centered maps can usually be constructed from much less data than 
robot-centered maps. Why? Knowledge of the relative location of objects makes it 
possible to deduce how the world looks from nearby places. For example, picture 
a robot with a laser range finder located in a convex room, such that all walls are 
within the limits of the sensor's perceptual range. A single scan can be sufficient to 
determine the location of all walls; and geometry can be applied to predict sensor scans 
from other viewpoints. This advantage is not shared by robot-centered maps, since the 
transformation from one sensor view to another is, in the absence of geometry, far 
from trivial. 


A second taxonomy, frequently discussed in the literature, distinguishes topological 
and metric approaches. 
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= Topological algorithms represent maps as graphs, where edges correspond to 
places, and arcs to paths between them. For example, places might be locations 
with specific distinguishing features, such as intersections and T-junctions in of- 
fice building, and arcs may correspond to specific behaviors or motion commands 
that enable the robot to move from one location to another, such as wall follow- 
ing. Recently, it has become popular to augment topological maps with metric 
information (relative distance, angle) to facilitate to disambiguation of similarly 
looking places. 


= Metric approaches embed maps into the plane. Locations in metric maps are 
memorized along with their global coordinates, typically expressed by x-y coor- 
dinates and, in certain case, orientations 0. At first glance, it appears that metric 
maps are more difficult to construct, since they have to be globally consistent. In 
practice, however, metric information has been found to be seminal for the dis- 
ambiguation of alike-looking places, and the metric paradigm has clearly led to 
larger and more consistent maps. 


Coincidentally, there is a strong correspondence in the literature between both tax- 
onomies. Most topological approaches rely on robot-centered maps, and most metric 
approaches build world-centered maps. 


14 


FAST INCREMENTAL MAPPING 
ALGORITHMS 


14.1 MOTIVATION 


In the previous two chapters, two principled solutions to the concurrent mapping and 
localization were presented. Both solutions are statistically sound. They both incor- 
porate uncertainty in robot poses and maps. However, both suffered important de- 
ficiencies. The Kalman filter approach cannot cope with ambiguities in perception, 
that 15, it requires that the data association problem is solved. It is also limited to a 
relatively small number of features. Both limitations preclude its application to build- 
ing high-resolution structural maps. The EM approach, on the other hand, provides 
a sound solution to the data association problem. However, it does not work in real- 
time, and all data has to be memorized. This is an important limitation. It precludes, 
for example, using this algorithm for real-time exploration and mapping of unknown 
environments. 


This chapter discusses a family of algorithms that can generate maps in near real- 
time, while simultaneously coping with the data association problem. The core of 
the method is non-probabilistic. Instead of representing the uncertainty in map esti- 
mates, it only calculates a single map, which is obtained as the maximum likelihood 
solution to the incremental mapping problem. Clearly, the lack of a representation of 
uncertainty in the estimates causes problems. In particular, the core algorithm is in- 
capable of mapping cyclic environments. Cycles are generally difficult to map, since 
the robot has to reconnect to a previously mapped area from a different direction. To 
overcome these problems, we augment the algorithm with a probabilistic estimator, 
a posterior estimator over robot poses. This estimator models the error of the maxi- 
mum likelihood method in pose space. It makes it possible to handle problems with 
large cycles, while simultaneously generating maps in near real-time. The algorithms 
presented here have practical significance. For specific type of sensors, such as laser 
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range finders, these algorithm have been shown to produce maps of relatively large 
environments in real-time. They are also relatively easy to implement, and therefore 
have been popular in mobile robotics. 


Towards the end of this chapter, we will discuss extensions to two important mapping 
problems that are mostly unexplored in robotics: The multi-robot mapping problem, 
in which a team of robots collaboratively acquire a map, and the 3D mapping problem, 
in which a robot seeks to generate a full three-dimensional map of its environment. As 
we will see, algorithms for multi-robot mapping and 3D mapping can be implemented 
with little effort on top of the algorithms discussed in this chaptr. 


The chapter is organized as follows. 


m Section 142 defines the incremental mapping problem and derives the basic algo- 
rithm for buiding maps incrementally using the maximum likelihood estimator. 


m Section 14.3 discusses gradient descent strategies fo hill climbing in log likeli- 
hood space, with the goal of finding maximum likelihood maps in continuous 
search spaces. It provides concrete gradient descent algorithms for some of the 
perceptual models and motion models discussed in this book. 


m Section 14.4 introduces the idea of simultaneously estimating a posterior over 
robot poses, and shows how to accommodate large residual errors when closing a 
cycle. It presents the main algorithm in this chapter, which combines incremental 
maximum likelihood estimation with a MCL-style posterior estimator. 


m Section 14.5 discusses extensions of the basic algorithm to the collaborative 
multi-robot problem. The resulting algorithm is capable of fusing data from mul- 
tiple robot platform whose initial pose relative to each other is unknown. How- 
ever, they assume that there is a designated team leader; and other robots start off 
in the map built by the team leader. 


m Finally, Section 14.6 presents an extension of the basic algorithm that enables 
a robot to build a three-dimensional map of the environment. These 3D maps 
combine structural and texture information, and lead to models similar to the 
ones used in video games. They are acquired using a robot equipped with two 
laser range finders and a panoramic camera. 
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14.2 INCREMENTAL LIKELIHOOD 
MAXIMIZATION 


The rationale behind both algorithms is that of likelihood maximization. Instead of 
computing a posterior over maps—which is the central objective in the next chapter— 
the algorithms described here seek to compute the most likely map only, along with 
the most likely robot poses. This is formally denoted by: 


argmax P(m,s(9 ...,s |d) (14.1) 


ms) ...,s (0 


Unfortunately, as we will see in the following two chapters, computing this expres- 
sion is highly intractable. Thus, the approaches described in this chapter implement 
incremental maximum likelihood estimators. They compute a sequence of poses and 
maps 


(m9), 809}, (m), 800}, (m, sy, NE (14.2) 


each by appending the most recent sensor measurement to the growing map. Notice 
that these estimates are not probabilistic! Instead of calculating a distribution over 
maps and poses, here we are only concerned with generating a single map, and a 
single pose. This restriction, which greatly reduces the computational burden when 
compared to a fully probabilistic approach, is the main source of brittleness of incre- 
mental maximum likelihood mapping. Further below, we will add a probabilistic pose 
estimator that will enable us to recover from certain estimation errors. 


The first question we have to ask is: How can a robot find such an incremental se- 
quence of maps and poses? Let us begin by restating the basic Bayes filter, which is 
at the heart of so many algorithms described in this book: 


Bel(s0) = P(s |d) (14.3) 
n P(o? | s,m) [Pe | a@-)) 0—10), т) Bel(s(*-9) 480—1) 


In incremental concurrent mapping and localization, we seek to compute the belief 
over maps as well as the belief over poses. Thus, we have to compute the belief over 
maps and poses. Additionally, it is convenient to index the maps by time, since in the 
incremental approach we will generate a whole sequence of maps. Hence the Bayes 
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filter for concurrent mapping and localization looks as follows: 


Bel(s,m) P(s,m |d) 


Bel(s*-), т) 4520 dm@-) 


This important equation extends Bayes filters to estimating maps. Unfortunately, 
Bayes filtering is inapplicable in practice. While Bayes filters can be calculated for 
localization alone with relative ease, this is not the case any longer in the concurrent 
mapping and localization problem. The reason for this is simple: tractability. While 
a robot pose is a three-dimensional quantity, a map is often described by many thou- 
sands of values. 


Maximum likelihood estimation is a simpler problem than the full posterior compu- 
tation. In maximum likelihood estimation, we seek to compute the maximum of the 
posterior probability: 


(800% m®*) = argmax Bel(s(?, m(?) (14.5) 


g(t) ym (t) 


Clearly, the maximum likelihood estimate carry a notion of its own uncertainty; how- 
ever, as we will see in this and future chapters, computing the map that maximizes 
likelihood is a challenging problem. While the maximum likelihood estimator is gen- 
erally easier to compute than the full posterior—after all, this is 'just' the mode of the 
posterior—even this problem is highly intractable. Incremental maximum likelihood 
estimators decompose the maximum likelihood estimation problem into a sequence 
of local, tractable problems. More specifically, they assume at each point in time t, 
one is readily a map and a pose estimate from the previous time step, that is, which 
incorporate all data up to time t — 1. They then calculate the maximum likelihood map 
and pose at time ?, an operation that can be performed efficiently. 


Let us describe this a bit more formally. The incremental maximum likelihood esti- 
mator maximizes the following conditional belief: 


Bel(s(?,m(0|s(t7D, md) = P(s,m™ |d, (t7D, mY) (14.6) 


which is the probability of the map and pose at time t given the data and the map and 
pose one time step earlier. The Markov assumption renders data gathered before time 


п P(o® | 09, тб) f [ Ре®,т® \а@-9,в@—1), т 


=p) 


(14.4) 
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t — 1 conditionally independent of the map and pose at time t given the map and pose 
at time t — 1. Hence, we can write 


Bel(s(? , тб s 7D, mV) = P(s©,m© | об, aD 8—1), тА) 


that is, only the data gathered between time t — 1 and t have to be considered. Notice 
the resulting incremental estimator, which is being maximized here, is much simpler 
than the one in (14.4) above: 


Bel(s(? , тб |8010), 070) 
= 1 P(o | s,m) P(s(9, тб | a€-D, 10, më) (14.8) 


Why is this simpler? In comparison with (14.4), one does not have to integrate over 
maps and poses any longer. In fact, the incremental posterior probability (14.8) can 
be computed in closed form, making it extremely convenient for on-line estimation. 
However, the question remains as to how incremental maximum likelihood approaches 
obtain pose estimates and maps at each point in time, and why it suffices to keep track 
of a single pose and map only, instead of a distribution over poses and maps. 


The first question is easily answered. Incremental maximum likelihood algorithms 
compute the map and pose pose at each time t by maximizing (14.8): 


(s,m) =  argmax Bel(s,m©|s¢-), mY) (14.9) 


800) am C) 


Even this calculation can be difficult, since the robot operates in a continuum and 
hence the space of all maps and poses cannot be searched efficiently. However, as we 
will see below, efficient hill climbing methods exist that, given a good starting point, 
tend to converge to a global optimum quickly. The resulting algorithm has the nice 
property that it can be applied in real time. 


The second question is more difficult to answer. By retaining only the most likely map 
and pose at each point in time, the algorithms described in this chapter are brittle. In 
certain environments, the likelihood function decomposes nicely and the brittleness 
can be tolerated, In others, such as environments with large cyclic corridors, the de- 
composition is invalid and the resulting maps are not only incorrect, but as a whole 
score poorly in their overall likelihood—despite the fact that the stepwise likelihood 
has been maximized. We will explicitly discuss such cases below, and offer a fix by 
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1: Algorithm incremental ML mapping(o, a, s, m): 


set s' = argmax; Р(8|а, s) 
3: repeat until satisfi ed 
s' — 8 + &V y [log P(o|s', m) + log P(s'|a, s)] 


5: т! —— m with (s', o) 


return (m , s") 


Table 14.1 The basic incremental maximum likelihood algorithm for concurrent mapping 
and localization. It accepts as input a map and a robot pose, along with an action idem and 
a sensor easurement. It outputs an updated map and pose estimate. Notice that none of the 
representations are probabilistics, making this algorithm brittle in practice. 


including a posterior estimator for the robot's pose, along with an algorithm that per- 
forms occasional adjustments. Unfortunately, additional brittleness is introduced by 
the hill-climbing nature of the maximum likelihood estimator. If the estimator gets 
stuck in a local minimum, the algorithms described here cannot recover. In practice, 
this severely limits the types and sizes of environments that can be mapped with the 
incremental maximum likelihood approach. Nevertheless, for accurate sensors rela- 
tively large and accurate maps can be built in real-time, giving this algorithm some 
practical importance. 


14.3 MAXIMUM LIKELIHOOD AS 
GRADIENT DESCENT 


14.3.1 Search in Pose Space 


We will now consider the question as to how to maximize the incremental likelihood 
function (14.9). This incvolves search in the space of all poses and all maps at time t. 


However, is we employ a fixed routine for map building (e.g., occupancy grid map- 
ping), it suffices to search the space of all poses. This reduces the computation sig- 
nificantly, as the space of poses if three-dimensional, where the space of all maps is 
huge. To see, let us consider the sigutation where the robot pose is known. Following 
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our assumptions, the t-th map m? is generated from the previous map m- P by a 
fixed, deterministic routing (e.g., occupancy grid mapping). Thus, the only parameter 
that remains to be optimized is the pose: 


в'9 = argmax Bel(s(? ,m( (800, 0 т 7D)|g 7D. mV) (14.10) 


g(t) 


Here m4 (5, o(0, m—)) denotes the map that is obtained by incrementally in- 
corporating the observation o? to the map т(*—!) at the alleged pose 5“). A little 
thought should convince the reader that one can safely replace this map by the most 
recent sensor measurement o(? only (since the pose is known), giving us 


S0 ж argmax Bel (s? , 001802), q, 470) (14.11) 


g(t) 


which, according to the definition of the incremental belief (14.8) can be spelled out 
as follows: 


800 = argmax P(o(? | 3) t7 0) P(s | at=), g(t-D, mt-D) 
g(t) 


(14.12) 
or in log-likelihood form: 


s® = argmaxlog P(o | s,m @-Y) + log P(s | (*7D, 8—0, 70) 


500) 


(14.13) 


To summarize, we have exploited the fact that growing the map is done by а fixed 
routine, which saves us the effort of searching the space of all possible maps. The 
resulting maximum likelihood expression only searches in the space of all poses. 


The resulting expression (14.13) is now simple enough to be searched directly. In 
particular, we need to search the space of all poses at time £ to find the pose that 
maximizes the incremental posterior. A common strategy for searching continuous 
spaces is gradient descent in log likelihood space. Luckily, all of the probabilities in 
(14.13) are differentiable. The gradient of the log of the probability on the right-hand 
side of (14.13) with respect to the desired pose is commonly written as 


V. log P(o9 | (9, m 7) + log PEO | al, 5-9, m6-D) — (14.14) 
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Since gradients are additive, we can decouple the gradient of the perceptual probability 
from that of the motion model: 


V о log P(o(? | 8, mY) + V о log P(s(? | а, 0, ті 4.15) 


Gradient descent incrementally changes the estimate 800) in the direction of the gradi- 
ent. That is, we apply the following search algorithm 


s «< в®+к [© log P(o® | в®,т®—)+ 


Vm log P(s(? | a879, (7D, (t7 0) (14.16) 


Here к > 0 isa step size close to zero that is required for gradient descent. The starting 
point for gradient descent is obtained by simply applying the most recent action item 
(odometry reading) to the pose: 


s® = argmax олла. (14.17) 


g(t) 


which can be computed using the kinematic motion models discussed in Chapter 5 
with noise parameters set to zero. Gradient descent then applies the update rule (14.16) 
repeatedly, until a convergence criterion is met. 


14.3.2 Gradient Calculation 


What remains to be discussed are the technical details of calculating the desired gradi- 
ents (14.16). Before we give details for specific perceptual models and motion models, 
it is important to emphasize that the calculation of gradients of differentiable functions 
is a purely mechanical exercise, though sometimes tedious. Gradients of log likeli- 
hood functions often look complex; however, they are obtained by differentiating any 
of the motion models or perceptual models discussed in this book, using the well- 
known rules for differentiation. The models discussed in this book are all piecewise 
differentiable, hence gradients can be calculated for all of them. 


Table 14.2 shows an algorithm for calculating the first derivative of the log-perceptual 
model 


Vc log P(o | (0, m2) (14.18) 
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1: Algorithm first derivative log range finder model2(o, s, m): 
2 dx = dy = 10 = 0 
3 for all ок € o do 
4 if ok < Omax — Osmall 
5: Lo, = 2 + жк cos 0 — ук sin 0 + ок cos(0 + Өк) 
6 Uo, = y + yx cos Ө + zy sin 0 + op sin(0 + Өк) 
7, Оты, = —rz,sin 0 — ук cos 6 — ок sin(0 + Өк) 
8 220 = —yy sin 0 + ть соз 0 + ок cos(0 + Өк) 
9: (2, y) = argmin,, y { (жо, im 27]? T (Yor T y»? | (a, y) occupied } 
10: dist? = (zo, — Z)? + (yo, — 9)? 
11 Odist^ _ (ax 2) 
12 E 2( ч ) 
ду ^ “Yor V 
Р _\ 0X6 _\ OYo 
13 ойшы" —2 | (ao, — 2) ag + (You — 0) Hee 
1 
14 а = Zit TT 
1 dist? | 
15 == т 
16 с= ! zw 
Omax 
a Odist? 
17 Bro ae oe 
18 Ob 1 Odist? 
Oy 202. Oy 
Ob 1 ddist? 
19 00 — бо? 00 
20 log q = log(ae® + c) 
д1 = > ab 
21 RS = crag? Ox 
o Ob 
22 nr = Saat ду 
: ip = шей 
стае 
24 ах+ = 21984 
Әх 
_ dl 
25 dy4- — Sv. 
О 
26 10+ 981 
P return (dx, dy, 40) 


Table 14.2 Algorithm for computing the fi rst derivative of the log probability computed 
by the range fi nder model in Table 6.3 on page 143. 
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Sd gv gross bs 


Algorithm first derivative log motion. model odometry(s, a, s): 


бо = atan2(y' — 9,27 — 3) — 0 


Otrans = /(- g) 2 (y = y)? 
2 = 0' — 0 — оь 


бо = atan2(y! — y, x’ — x) — 0 
dtrans = y (2 — z^)? TÍU-— yy 
à 


, 
rot2 — 0 —0— бов 
frot 999. frot 999 

Өт ci ар ci 
Odtrans — s'r. Odtrans — V —V 
Ox! Stans Oy btrans 


argi; = Óroti = дров: argi2 = Олдо г] 


ато = Ótrans — trans; argoo = оз Ótran. 
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F ардан " 
s + o4 (drot1 + Órot2) 


arg31 = Órot2 — Órot2; arg3o = Обов А 


D O2Ótzans 


Oargi; _ Ou. ATEI _ _ Ou. дагву _ 0 
Ox! Onl 9079 — Oy’ ^ OOF ~~ 
Oargin — Әд.окі | Odtrans . дагвіз froti | Ə trans . Әагвуз _ 
Әх” = Ol өт t On дш; ay = O1 ay + On 008; ge = 0 
ðargəı — _ Odtrans. ӨёїБо1 — _ Əftrans . Әагбәр _ 0 
Du s Oa 2 ду oy * O00. 7 
darg 3 trans дб ‚ darg Obtrans Әд.окі . Garg 
M = O3 Duns + а4 Tm А op = O3 Dy! + од ow А эб = Q4 
дагвзу _ Osea — oe ES 
Or! ^—— 0; Oy’ = 0; 900! ^ 1 
дат зо __ Odtrans . Әагвзә _ Odtrans . дагазә _ 
Oa! 02 Dg! * Oy  — O2 gy * дө = A 
fori = 1 to3 
pi = prob(arg;, arg;2) 
Op, _ Üprob(arg;,arg;) даг | Oprob(argi,argi2) даг; 
Ox! Oargii дате Ox! 
Opi . aprob(arg., ,arg;2) Oargi | Bordb(ate, Саты) дате; 
Oy’ . Oarg;i ду! argiz Oy’ 
Opi _ Oprob(arg;, ага) датұ;1 ED Oprob(arg;; argiz) Oarg io 
00! . дате ;1 06’ дате jo 06’ 


p= X log р; 


Op | NcC3 1 др:. др _ 

az? ae pi дх'° ду! 
Op Op Op 

return (927, ду”? Әр”) 


уз 1 Opi. 
© 4м4=1 p; Oy’? 00! — 


Op _ 


Table 14.3 Algorithm for computing the fi rst derivative of the logarithm of the motion 
model log P(s’ | a, s) based on odometry information (see Table 5.5 on page 108). 
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1: Algorithm first derivative prob normal distribution(d, b): 
. m И] i12? 

De prob = от € 25 

3: Sprob = —prob¢ 

4: о = (> — т(2т0)-#) prob 

9; return (po. prob Б 


Table 14.4 Algorithm for computing the fi rst derivative of the zero-centered normal dis- 
tribution with variance b (see Table 5.2 on page 97). 


for the perceptual model discussed in Chapter ?? (see Table 6.3 on page 143). The 
calculation of the gradient is fairly straightforward, and we arranged the algorithm in 
Table 14.2 in a way that should make it easy to understand the mechanics of calculating 
the desired derivative. Similarly, Table 14.3 depicts an algorithm for calculating the 
first derivative of the logarithm of the odometry-based motion model, 


Vz) log P(s(? | а, s(t70) (14.19) 


which was originally discussed in Chapter ?? (see Table 14.3 on page 402). This al- 
gorithm requires the first derivative of the zero-centered noise model prob(), which 
is provided in Table 14.4 (zero-mean normal distribution). While this model is not 
exactly the one required in (14.16)—it does not take the model m into account—it is 
close (the extension is trivial). It demonstrates one more the mechanics of calculat- 
ing a derivative of a complex function using the chain rule of differentiation. When 
implementing fast gradient descent, we strongly recommend to calculate the gradients 
by hand, instead of blindly copying the algorithms listed here. 


14.3.3 Suggestions for the Implementation 


Our algorithm for scan alignment is asymmetric. It only relies on measurements in o 
that fall into the free-space of the map m. What it not uses is the converse: if the map 
shows an occupied region in an area that, according to the scan s, should be free— 
this “mismatch” is not used for adjustment. Clearly, this is a deficiency of the basic 
perceptual model. If possible, we recommend to implement a symmetric version of 
the model, which penalizes both: measurements in s that fall into ™m’s free-space, and 
regions in m that fall into s’s free-space. The resulting perceptual model is slightly 
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more complex. Calculating the gradient with respect to the pose s' is analogous to the 
algorithm given here. 


Additionally, the gradient calculation can be sped up significantly by caching specific 
information. If implemented as described above, we found that only approximately 
ten gradient calculations could be carried out per second, using 500Mhz Pentium II 
Computers running Linux. By far most of the time is consumed calculating the per- 
ceptual model and its first derivative, which spend most of the time with a single 
operation: The search for the nearest occupied location, which is performed in line 
9 in the gradient algorithm shown in Table 14.2 (and also line 9 in the algorithm 
range finder model2 shown in Table 6.3 on page 143). This search is expensive; 
furthermore, when changing the estimated pose s(?, the search has to be repeated. 


Alternatively, one can first calculate the location of the nearest occupied location for 
a fine-grained grid, overlayed the x-y-space. Calculating this grid takes some start-up 
time, however, with it the search reduces to a table-lookup operation. The result of 
the search is then only approximate, as not all locations within a grid cell necessarily 
map to the same nearest neighbor. Using this technique with grid resolutions between 
10 and 20 cm, we found that the resulting gradients were still well-suited for hill- 
climbing in likelihood space. Moreover, we found that the calculation of the gradient 
could be sped up by two orders of magnitude, making it possible to perform hundreds 
of iterations per second. 


14.3.4 Examples 


Figure 14.1 shows two example sequences (arranged vertically) of applying gradi- 
ent descent for scan alignment. In both examples, the likelihood is maximized using 
the function first derivative log range finder model2 in Table 14.2. Both examples 
were recorded by a moving robot. We manually perturbed the second scan relative to 
the first by 10 cm along each axis, and 30 degrees rotational error. 


Despite this large error, the routine reliably converges to the "right" alignment in ap- 
proximately 100 iterations. Shown in Figure 14.1 are the initial alignments, and the 
alignments after 10, 50, and 100 iterations of gradient descent. These examples sug- 
gest that the likelihood function possesses few, if any, local minima in the proximity 
of the global optimum. In practice, we found that for scans recorded in sequence (e.g., 
with 3 Hertz), the gradient descent search routine practically always converges to the 
right maximum and aligns the scans with high accuracy. 
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Figure 14.1 To examples of gradient descent for aligning scans (arranged vertically). In 
both cases, the initial translational error is 10 cm along each axis, and the rotational error 
is 30 degrees. The gradient descent algorithm safely recovers the maximum likelihood 
alignment. 
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mismatch — —- 


Figure 14.2 A typical map obtained using the most simple incremental maximum likeli- 
hood approach. While the map is locally consistent, the approach fails to close the cycle 
and leads to inconsistent maps. This elucidates the limitations of stepwise likelihood maxi- 
mization, and illustrates the diffi culty of mapping cyclic environments. 


14.3.5 Limitations 


The basic limitation of stepwise likelihood maximization is that it can fail to generate 
globally optimal solutions. The problem is typically not observed when mapping a 
single hallway, or a simple non-cyclic environments—since here local consistency be- 
tween adjacent measurements suffices to build a globally consistent map. However, in 
cyclic environments the robot has to establish correspondence to previously gathered 
data with potentially unbounded odometric error, and has to revise pose estimates 
backwards in time. Consequently, local consistency as achieved by the incremental 
maximum likelihood method is insufficient to achieve global consistency. 


Figure 14.2 illustrates this problem using an example. It shows a map, acquired in a 
cyclic environment. The robot traversed a sequence of corridors, finally reaching its 
initial location. While along the way the maps appear locally consistent, it neverthe- 
less accrues some localization error. As the robot closes the cycle, the accumulated 
localization error is too large to be accommodated locally, and the inability to cor- 
rect maps backwards in time leads to a permanent mismatch shown in that figure. The 
resulting map happens to be still good enough for navigation. It is not difficult to imag- 
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ine, though, that larger cycles will lead to bigger mismatches that make it impossible 
to navigate. 


In summary, the incremental maximum likelihood approach suffers two main limita- 
tions: 


1. It is unable to cope with large odometry error. 


2. It is unable to correct poses backwards in time. 


Next we will describe an extension that addresses these two shortcomings, leading to 
an algorithm that has been found to work reliably in environments with cycles. 


14.4 INCREMENTAL MAPPING WITH 
POSTERIOR ESTIMATION 


To overcome the cycle mapping problem, it is necessary to revise past pose estimates 
backwards in time. This has implications for the way data is memorized. If, for 
example, we only maintained a single occupancy grid at each point in time, it would 
be impossible to modify the grid in response to an inconsistency that was discovered 
when closing a cycle. Thus, it is necessary to memorize past sensor measurements 
along with their estimated pose, so that pose estimates can be corrected retrospectively. 


14.4.1 Detecting Cycles 


Next, we need a mechanism for detecting cycles. This is realized by introducing 
a second estimator, which performs full posterior estimation over the robot's poses, 
concurrently to the map estimation. Posterior pose estimation is the same as localiza- 
tion. Robot pose estimators were already discussed extensively in the chapters on state 
estimation (Chapter 2) and localization (Chapter 7) . Let us briefly restate the basic 
filter equation: 


Bel(z(?) = POPEO, mO) f Р(а®|в%-®,а%-®,т®-®) Bel(a 9) 4420) 


Where x /? is the pose of the robot at time t. An algorithm for posterior estimation 
over poses was given in Table 7.1 on page 163. 
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When implementing the localization algorithm, one has to take into account that it is 
run concurrently with an incremental map estimator, which already estimates robot 
poses. At the very least, it must be consistent with the maximum likelihood estimator 
in non-cyclic situations. Thus, the term P(z(? |а 1), z*79, m) is not a model of 
robot motion. Instead, it is a model of the residual error induced by the maximum 
likelihood estimator. To be consistent, the mode of P(a |a}, 0—1), m), 


argmax P(z(? |a ^79, 20-0, т) (14.21) 


a(t) 


must equivalent to the result of the maximum likelihood pose estimator (14.13). The 
uncertainty in the conditional reflects the posterior uncertainty of this estimate. As- 
suming that the uncertainty it normally distributed with zero mean, its inverse covari- 
ance matrix can be recovered from the second derivative of the log likelihood function 
in (14.13) (notice that the second derivative of the log of a multivariate normal dis- 
tribution is always the inverse of its covariance). Similarly, one can use any of the 
motion models described in this book and “guess” the motion noise parameters by 
trial-and-error. In either case, P(x |a“-)), z(*-U, т) should be a much more fo- 
cussed probability distribution than the original robot motion model, since it models 
the uncertainty after correcting the pose based on a sensor measurement. 


Similarly, the term P(o“)|a,m™) is not the perceptual probability. If one were to 
use the perceptual probability, the information from the most recent sensor scan would 
be used twice—once in the maximum likelihood estimator and once in the posterior 
estimate—and the robot could become falsely over-confident of its pose. Instead, 
we suggest to evaluate P(o |20, m?) only with regards to sensor measurements 
that create conflicts when closing a cycle. Such past measurements can easily be 
identified, by keeping track of their time and their maximum likelihood location. As a 
result, the posterior 5 el(a(?) grows as the robot moves into unexplored terrain. When 
reconnecting with a previously mapped area, its uncertainty will decrease as a result 
of factoring in the perceptual probability P(o |a (9 , m(?). 


14.4.2 Correcting Poses Backwards in Time 


For the purpose of mapping, the robot’s pose estimate is now replaced with the mode of 
the posterior Bel(z(?). In other words, instead of using (14.13) as the pose estimate, 
it is merely used to define the posterior P(z(?|a(*-9, z(*7), т). When the robot 
moves into unexplored terrain, argmax,( Bel(z(?) will be identical to the estimate 
of the maximum likelihood estimator. However, when closing a cycle, both estimates 
may differ. 
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1: Algorithm incremental ML mapping for cycles(o, a, s, m, Bel(s)): 


; (m/, s') = incremental. ML mapping(o, a, s, m) 
3: Bel(s') = P(o, s") f P(s'|a, s) Bel(s) ds 


s" = argmazs Bel(s’) 


5: if s" # 8' 

linearly distribute s" — s' along cycle 
7: refi ne past poses with algorithm incremental МЇ, mapping 
8: return (mv , s", Bel(s’)) 


Table 14.5 Extension of the algorithm incremental ML. mapping for mapping cyclic 
environments. The algorithm carries a posterior estimate over poses, along with the step- 
wise maximum likelihood map. When a cycle is closed, the maximum likelihood estimate 
s' and the mode of the posterior s" will differ, initiating the backwards adjustment of robot 
poses. 


To maintain consistent maps, these differences have to be propagated backwards in 
time, along the cycle. This is again a nested maximum likelihood estimation problem. 
However, care has to be applied with regards to computational efficiency. One way to 
implement the backwards correction efficiently is the following two-phase algorithm: 


1. The deviation between the stepwise maximum likelihood estimate and the mode 
of the posterior is distributed linearly among all poses along the cycle. Calcu- 
lating linear correction terms is extremely fast and places the robot poses some- 
where close to their desired corrected estimates. However, the optimization prob- 
lem is highly non-linear. 


2. Subsequently, the incremental maximum likelihood estimator is applied back- 
wards in time to maximize the probability of poses given their neighbors (in 
time). This algorithm is very similar to the one above, hence will not be de- 
scribed here in detail. 


Table 14.5 sketches the resulting algorithm, which simultaneously maintains a maxi- 
mum likelihood map and a full posterior estimate over poses. When a cycle is closed, 
the maximum likelihood pose estimate and the mode of the posterior will deviate, 
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which is detected in Step 5 of the algorithm. Steps 6 and 7 correspond to the mecha- 
nism for refining poses backwards in time. 


14.4.3 Illustrations 


Figure 14.3 shows an example of applying this algorithm in practice, generated from 
the same data that was used to generate the map shown in Figure 14.2. Here the poste- 
rior estimate is implemented using Monte Carlo localization (MCL). In the beginning, 
as the robot maps unknown terrain, the sample set spreads out, though it does not 
spread quite as much as a raw robot motion model would suggest. This is shown in 
Figure 14.3a, which shows the posterior samples along with the map obtained by the 
incremental maximum likelihood estimates. 


Figure 14.3b shows the situation just before closing the cycles. As before, the robot has 
accrued significant pose estimation error in its maximum likelihood estimate. How- 
ever, this error appears to be well captured in the posterior estimate of the pose, rep- 
resented by the particle set. Figure 14.3c depicts the situation after closing the cycle. 
The posterior estimate is now very focussed, since it incorporates perceptual measure- 
ments relative to earlier parts of the map. 


Despite its reliance on maximum likelihood estimates, the algorithm incremen- 
tal ML mapping for cycles is very robust. Figure 14.4 shows the result of an ex- 
treme experiment, where this algorithm was applied to the mapping problem in the 
absence of any odometry (action) data. Consequently, the raw data is extremely dif- 
ferent to interpret. Figure 14.4a shows the data set in the absence of odometry infor- 
mation. Clearly, overlaying scans produces an image that is impossible to interpret for 
the human eye. Our algorithm, applied to this data set, nevertheless produces a rea- 
sonably accurate map, as shown in Figure 14.4b. The only modification to the basic 
algorithm is the omission of the gradient V y log P(s'|a, s), which is simply assumed 
to be zero due to the lack of action items a. The error accrued during mapping is 
approximately twice as large as with odometry information; however, as the cycle is 
closed, the residual error along the loop shrinks significantly and the resulting map is 
of the same quality as if odometry information were available. However, these results 
have to be taken with a grain of salt. Our approach will clearly fail in long, featureless 
corridors which lack the necessary structure for estimating poses from laser range data 
alone. All these results demonstrate, thus, is the robustness of the basic routine. 


All the maps above have been obtained in real time on a 300Mhz Pentium II Laptop 
PC, with a robot moving at approximately 60 cm/sec (see Figure 14.5). However, we 
should notice that the current approach is unable to handle nested loops, and if the 
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Figure 14.3 Mapping cyclic environments using particle fi lters for posterior pose estima- 
tion. The dots, centered around the robot, indicate the posterior belief which grows over 
time (a and b). When the cycle is closed as in (c), the map is revised accordingly and the 
posterior becomes small again. 
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Figure 14.4 Mapping without odometry. Left: Raw data, right: map, generated on-line in 
real-time. 


loop is very large, it may be impossible to correct the pose estimates along the cycle 
in real time. 


14.5 MULTI-ROBOT MAPPING 


Estimating a full posterior over poses has a second advantage: It makes it possible to 
localize a robot in the map built by another. This is an essential step towards multi- 
robot robot mapping, which requires that robots determine their poses relative to each 
other. If one robot is known to be started in the map built of another, this estimation is 
equivalent to global localization, which can be carried out using MCL. 


The basic algorithm for cooperative mobile robot mapping is very similar to incre- 
mental ML mapping for cycles stated in Table 14.5, with two modifications: First, 
the initial belief Bel(s‘) for the second robot is initialized by a uniform distribution 
over poses, instead of a point-mass distribution. Second, map measurements are only 
integrated when the robot feels confident enough that it knows its pose relative to the 
first. This is determined using the entropy of the posterior Bel(s(?): If the entropy 
H Bel(s(?)| falls below a certain threshold, localization is assumed to be completed 
and the second robot integrates its sensor measurements into the map of the first one. 


Figure 14.6 illustrates how a second robot localizes itself in the map built by another 
robot (called there: the team leader). The robots used here are Pioneer robots equipped 
with laser range finders, such as the ones shown in Figure 14.5. Figure 14.6a shows 
the belief after incorporating one sensor scan, represented by particles scattered al- 
most uniformly through the map built by the team leader. A few time steps later, all 
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Figure 14.5 RWI Pioneer robots used for multi-robot mapping. These robots are equipped 
with a SICK laser range fi nder. 


surviving particles are focused on the same pose. Now the robot knows with high 
probability its pose relative to the other robot's map. 


We finally briefly outline an architecture for fusing maps from many robots, oper- 
ating distributedly. If maps from many robots shall be integrated, one can build a 
hierarchical architecture. At the bottom of the hierarchy, each robot builds its own 
maps, estimating poses within its local coordinate frame. The corrected poses and 
measurements are than communicated to a higher level program, which uses the iden- 
tical mapping approach, to develop a single map by localizing each robot relative to 
its coordinate frame. The integration then continues recursively up the hierarchy, un- 
til the final module integrates the data of its submodules. The resulting map is then 
communicated back (at high frequency) to all modules, including the robots, to aid the 
process of localization. 


Such an architecture has been implemented and led to the results shown in Figure 14.7. 
This map was acquired in real-time using three robots, which locally corrected their 
pose estimates. A higher level module collected these pre-corrected data, and used the 
maximum likelihood estimator described above to build a single map for the robots. 
This map has been constructed in real-time, while the robot were in motion. This 
is necessary for robot that seek to systematically explore an unknown environment. 
More detail will be given in Chapter ??, where we talk about probabilistic strategies 
for robot exploration. 


(a) 


414 CHAPTER 14 


(b) 


team leader 


Figure 14.6 А second robot localizes itself in the map of the fi rst, then contributes to 
building a single unifi ed map. In (a), the initial uncertainty of the relative pose is expressed 
by a uniform sample in the existing map. The robot on the left found its pose in (b), and 
then maintains a sense of location in (c). 


Figure 14.7 (a) Team of 3 robots, which produced the map shown in (b) in real-time. Also 
shown in (b) are the paths of the robots. 


14.6 MAPPING IN 3D 


Finally, we will be interested in developing full three-dimensional maps if building in- 
teriors. Such maps offer three key advantages over the two-dimensional maps studied 
throughout most of this book. 


m First, 3D maps are easier to acquire than 2D maps, at least from a perceptual point 
of view. By looking at the full 3D structure of a building interior, the problem 
of estimating correspondence is simpler. This is because places that might look 
alike in 2D often do not in 3D, reducing the danger of confusing them with each 
other. 
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(b) 


Figure 14.8 (a) Pioneer robot equipped with 2 laser range fi nders used for 3D mapping. 
(b) Panoramic image acquired by the robot. Marked here is the region in the image that 
corresponds to the vertical slice measured by the laser range fi nder. 


m Second, 3D maps facilitate navigation. Many robot environments possess signif- 
icant variation in occupancy in the vertical dimension. Modeling this can greatly 
reduce the danger of colliding with an obstacle. 


m Third, many robot tasks require three-dimensional information, such as many 
tasks involving the localization and retrieval of objects or people. 


m Finally, 3D maps carry much more information for a potential user of the maps. 
If one builds a map only for the sake of robot navigation, then our argument does 
not apply. However, if map are being acquired for use by a person (e.g., a rescue 
worker entering a building after an earthquake), 3D information can be absolutely 
critical. 


Given these obvious advantages of 3D over 2D maps, it is surprising that the robotics 
community has paid so little attention to mapping in 3D. 


The estimation of 3D maps is a challenging computational problem, which is 
amenable to many of the techniques described in this book. However, even with pow- 
erful 2D mapping one can obtain reasonable 3D maps. In the remainder of this chapter, 
we will briefly show results of a routine for acquiring 3D maps. At its center is a Pio- 
neer robot equipped with two laser range finder and a panoramic camera. The robot is 
shown in Figure 14.8a. One of the laser range finders is directed forward, emitting a 
horizontal plateau of laser light. This laser range finder is used for concurrent mapping 
and localization using the algorithm described in this chapter. The second is pointed 
upwards, orthogonal to the robot's motion direction. This this laser the robot can scan 
the walls and the ceiling, gathering critical range information for building 3D maps. 
Finally, the panoramic camera is mounted adjacent to the upward laser scanner, en- 
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Figure 14.9 Views of the 3D structural map, for the high-res model (left column) and the 
reduced resolution model (right column). 


abling to acquire color information that are easily matched to the corresponding range 
measurements. Figure 14.8b shows an example of a panoramic image. The line in the 
image corresponds to the range data acquired by the upwards-pointed laser. 


Figure 14.9 shows images of a 3D structural map acquired in a multi-corridor indoor 
environment, after traversing it once. Localization is provided by the algorithm in- 


Fast Incremental Mapping Algorithms 417 


(b) 


(d) 


Figure 14.10 Snapshots of 3D texture maps, acquired using the modifi ed Pioneer robot. 
These views are generated using a VRML viewer. Some of these views are similar to those 
one would see when entering the building. Others show the building from perspectives that 
are not available in the physical world. 


cremental ML mapping for cycles. The vertical range measurements are then con- 
verted into polygones without additional pose estimation. The resulting map can be 
displayed in a VRML viewers, enabeling users to "fly through the building" virtually, 
without actually entering it. It also enables users to assume viewpoints that would be 
impossible in reality, such as the ones shown in Figure 14.9a and d. 


A limitation of this map is its very large number of polygones, which increases lin- 
early with robot operation. Simplifying polygonial maps has long been studied in the 
computer graphics literature, which concerns itself (among other things) with methods 
for fast rendering of complex polygonal maps. While the left column in Figure 14.9 
shows views from the high-complexity map, the right column stems from a map which 
possesses only 5% of the polygones of the left map. The reduction of the polygonal 
map is achieved thorugh a routine adopted from the computer graphics literature [10]. 
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The reduced map can be rendered in real-time without much of a noticeable loss of 
accuracy. 


Finally, Figure 14.10 shows images obtained from a full 3D texture map. This map has 
been obtained by projecting the color information gathered by the panoramic camera 
onto the structural model. This map can equally be rendered in VRML. Again, the 
texture information is not used for localization. Instead, the location estimates of the 
structural components and the texture is directly obtained from the 2D map, exploit- 
ing knowledge of the location of the upward pointed laser and the panoramic camera 
relative to the robot's local coordinate system. 


14.7 SUMMARY 


This section discussed a collection of algorithms that blend together maximum likeli- 
hood estimation and posterior estimation over poses. In particular: 


m We introduces a fast maximum likelihood algorithm for incrementally growing 
maps. 


m We showed how to implement this algorithm using gradient descent, providing 
example programs for calculating derivatives in log likelihood space for a specific 
perceptual and motion model. 


m We discussed the strengths and shortcomings of the approach. In particular, we 
showed that the incremental approach fails to find good maps in cyclic environ- 
ments. 


= We then introduced an extension which integrates а full posterior estimator over 
robot poses. We showed how to use these posterior estimates for resolving in- 
consistencies in cyclic environments. 


= We illustrated, through examples, the robustness of the algorithm, while pointing 
out that at the same time, the algorithm is incapable of accommodating nested 
cycles. 


m We discussed extensions to multi-robot mapping, pointing out that the posterior 
estimation enables one robot to localize itself in the map of another. 


= Finally, we showed examples of three-dimensional maps generated using a robot 
with an upward pointed laser range finder and a panoramic camera. 
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14.8 BIBLIGRAPHICAL REMARKS 


Shall we show images of Gutmann/Konolige? I'd opt for yes. 


14.9 PROJECTS 


1. Develop a 3D mapping algorithm that exploits the full 3D structure for pose and 
map estimation during mapping. 


2. The multi-robot algorithm described in this chapter requires that each robot starts 
in the map built by another robot. Develop a multi-robot mapping algorithm 
that relaxes this assumption, that is, where robots can start at arbitrary poses and 
might only later traverse the same terrain. 


3. Develop a 3D mapping algorithm that represent the enviroment by typical build- 
ing components, such as walls, doors, tables, chairs, instead of sets of polygones. 
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15 


MARKOV DEVISION PROCESSES 


15.1 MOTIVATION 


Thus far, the book has focused on robot perception. We have discussed a range of 
probabilistic algorithms that estimate quantities of interest from sensor data. However, 
the ultimate goal or any robot software is to choose the right actions. Accurate state 
estimation is only desirable insofar it facilitates the choice of action. This and the 
following chapter will therefore discuss probabilistic algorithms for action selection. 


To motivate the study of probabilistic action selection algorithms, consider the follow- 
ing examples. 


1. Aa robotic manipulator grasps and assembles parts arriving in random configu- 
ration on a conveyer belt. The configuration of a part is unknown at the time it 
arrives, yet the optimal manipulation strategy requires knowledge of the config- 
uration. How can a robot manipulate such pieces? Will it be necessary to sense? 
If so, are all sensing strategies equally good? Are there manipulation strategies 
that result in a well-defined configuration without sensing? 


2. Anunderwater vehicle shall travel across the North Pole. Shall it take the shortest 
route, running risk of loosing orientation under the Ice? Or sould it avoid large 
featureless areas at the risk of using more time to reach its goal. 


3. An robotic helicopter flies autonomously in a wooded area. The helicopter should 
stay clear of obstacles such as the ground so that even an unexpected wind gust 
can make it crash. But how far is far enough? 


4. A team of robots explore an unknown planet. The problem is particularly hard if 
the robots do not know their initial pose relative to each other. Shall the robots 
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seek each other to determine their relative location to each other? Or shall they 
instead avoid each other so that they can cover more unknown terrain? And what 
are the optimal ways to explore a planet with teams of robots? 


These examples illustrate that action selection in many robotics tasks is closely tight 
to the notion of uncertainty. In some tasks, such as robot exploration, reducing un- 
certainty is the direct goal of action selection. Such tasks are known as information 
gathering tasks. Information gathering tasks will be studied in the next chapter. In 
other cases, a reducing uncertainty is merely a means to achieving some other goal, 
such as reliably arriving at a target location. Such tasks will be studied here. 


From an algorithm design perspective, is convenient to distinguish two types of uncer- 
tainty: uncertainty in action, and uncertainty in perception. 


1. Deterministic versus stochastic action effects. Classical robotics often assumes 
that the effects of control actions are deterministic. In practice, however, actions 
cause uncertainty, as outcomes of actions are non-deterministic. The uncertainty 
arising from the stochastic nature of the robot and its environments mandates that 
the robot senses at execution time, and reacts to unanticipated situations—even 
if the environment state is fully observable. It is insufficient to plan a single 
sequence of actions and blindly execute it at run-time. 


2. Fully observable versus partially observable systems. Classical robotics often 
assumes that sensors can measure the full state of the environment. As argued 
repeatedly throughout this book, this is an unrealistic assumption. The lack of 
perfect sensors has two ramifications: First robot control must be robust with 
respect to current uncertainty. Second, it must cope with future, anticipated un- 
certainty, and choose actions accordingly. An example of the latter was given 
above, where we discussed a robot which has the choice between a shorter path 
through a featureless area, and a longer one that reduces the danger of getting 
lost. 


Throughout this chapter, we will take a very liberal view and make virtually no dis- 
tinction between planning and control. Fundamentally, both planning and control ad- 
dress the same problem: to select actions. They differ in the time constraints under 
which actions have to be selected and in the role of sensing during execution. The 
algorithms described in this chapter are all similar in that they require a off-line opti- 
mization (planning) phase, in which they calculate a control policy. During execution, 
the control policy can be invoked very efficiently, and it can cope with a range of dif- 
ferent situations. By no means is the choice of algorithms meant to suggest that this 
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is the only way to cope with uncertainty in robotics. Instead, it reflects the style of 
algorithms that are currently in use. 


The majority of algorithms discussed in this chapter assume finite state and action 
spaces. Continuous spaces are approximated using grid-style representations. The 
chapter is organized as follows. 


m Section 15.2 discusses in depth the role of the two types of uncertainty and lays 
out their implications on algorithm design. 


m Section 15.3 introduces value iteration, a popular planning algorithm. Value iter- 
ation, as introduced there, addresses the first type of uncertainty: the uncertainty 
in robot motion. It rests on the assumption that the state is fully observable. 
The underlying mathematical framework is known as Markov Decision Processes 
(MDP). 


m Section 16.2 discusses a more general planning algorithm that addresses both 
types of uncertainty: Uncertainty in action and uncertainty in perception. This 
algorithm adopts the idea of value iteration, but applies it to a belief space rep- 
resentation. The framework underlying this algorithm is called Partially Observ- 
able Markov Decision Processes (POMDPs). The POMDP algorithm can antici- 
pate uncertainty, actively gather information, and explore optimally in pursuit of 
an arbitrary performance goal, at the expense of increased computational com- 
plexity. 


m Finally, Section 16.5 discusses augmented MDP algorithms, which are cross- 
overs between MDP and POMDP algorithms. Augmented MDP algorithms con- 
sider uncertainty, but abstract away detail and hence are computationally much 
more efficient than POMDP solutions. 


In the following chapter, .... we will discuss a more general algorithms that addresses 
both types of uncertainty: Uncertainty in action and uncertainty in perception. The 
framework underlying these algorithm is called Partially Observable Markov Decision 
Processes (POMDPs). The POMDP algorithm can anticipate uncertainty, actively 
gather information, and explore optimally in pursuit of an arbitrary performance goal, 
at the expense of increased computational complexity. 
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Figure 15.1 Near-symmetric environment with narrow and wide corridors. The robot's 
initial location is known, but not its pose. 


15.2 UNCERTAINTY IN ACTION 
SELECTION 


Make Figure: Classical Planning 
Make Figure: MDP planning 
Make Figure: POMDP planning 


Figure 15.1 shows a toy-like environment that illustrates the different types of uncer- 
tainty. Shown there is a mobile robot in a corridor-like environment. The environment 
is highly symmetric; the only distinguishing feature are two walls at its far ends. At 
symmetric locations, the environment possesses a small number distinct places, one 
labeled the goal location (G), another which contains a pit (P), and one that contains 
a unique landmark that can help the robot finding out where it is. The robot's task is 
to advance to the goal location as quickly as possible while avoiding falling into the 
pit. Let us suppose the goal and the pit are perceptually indistinguishable—unless the 
robot actually enters the region, running risk to fall into the pit. Finally, we notice 
that there are multiple paths to either the goal, one that is short but narrow, and two 
others that are longer but wider. Clearly, this environment is less complex than natural 
robotic environments; however, it is complex enough to distinguish different types of 
uncertainty. 


In the classical robot planning paradigm, there is no uncertainty. The robot knows its 
initial pose and it knows the location of the goal. Furthermore, actions are executed 
in the physical world as commanded. In such a situation, it suffices to plan off-line 
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a single sequence of actions, which is then executed at run-time. There is no need 
to sense. Figure ?? shows an example of such a plan. Obviously, in the absence of 
errors in the robot actuator, the narrow shorter path is superior to the longer, wider 
one. Hence, a planner that strives for optimality would choose the former path over 
the latter. Many existing planning algorithm are only concerned with finding a path, 
regardless of optimality. Thus, they might choose either path. What happens if the 
plan is executed? Of course, physical robot hardware is inaccurate. A robot blindly 
following the narrow hallway might run danger of colliding with the walls. Further- 
more, a blindly executing robot might miss the goal location because to the error it 
accrued during plan execution. In practice, thus, planning algorithms of this type are 
often combined with a sensor-based, reactive control module that consults sensor read- 
ings to adjust the plan so as to avoid collisions. Such a module might prevent the robot 
from colliding in the narrow corridor. However, in order to do so it may have to slow 
down the robot, making the narrow path inferior to its alternative. 


A paradigm that encompasses uncertainty in robot motion is known as Markov deci- 
sion processes, or MDPs. MDPs assume that the state of the environment can be fully 
measured at all times. In other words, the perceptual model P(o|s) is deterministic 
and bijective. However, is allows for stochastic action effects, that is, the action model 
P(s'|s, а) may be non-deterministic. As a consequence, it is insufficient to plan a sin- 
gle sequence of actions. Instead, the planner has to generate actions for a whole range 
of situations that the robot might find itself in, either because of its actions, or be- 
cause of other environment dynamics. One way to cope with the resulting uncertainty 
is to generate a policy for action selection defined for all states that the robot might 
encounter. Such mappings from states to actions are also known as universal plans 
or navigation functions. An example of a policy is shown in Figure ??. Instead of a 
single sequence of actions, the robot calculates a mapping from states to actions indi- 
cated by the arrows. Once such a mapping is computed, the robot can accommodate 
non-determinism by sensing the state of the world, and acting accordingly. Addition- 
ally, this framework opens the opportunity to guide the action selection process based 
on future, anticipated uncertainty. Consider, for example, a highly inaccurate robot 
which, if placed in the narrow corridor, is likely to collide with a wall. Planning algo- 
rithms that consider motion uncertainty can assess this risk at planning time and might 
choose the wider, safer path. 


Let us now return to the most general, fully probabilistic case, by dropping the as- 
sumption that the state is fully observable. This case is known as partially observable 
Markov decision processes, or POMDPs. In most if not all robotics applications, mea- 
surements o are noisy projections of the state s. Hence, the state can only be estimated 
up to a certain degree. To illustrate this, consider the situation depicted in Figure ?? 
under the assumption that the robot does not know its initial orientation. Clearly, 
the symmetry of the environment makes it difficult to disambiguate the robot's pose. 
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By moving directly towards the projected goal state the robot faces a 5096 chance of 
falling into the pit—which would tell it where it is but result in mission failure. The 
optimal plan, thus, is to move to the small area in the upper right (or lower left), which 
enables the robot to disambiguate its pose. After that, the robot can then safely move 
to its goal location. Thus, the robot has to actively gather information while suffering 
a detour. This is an example of the most interesting scenario in probabilistic robotics: 
The robot's sensors pose intrinsic limitations as to what the robot knows. Similar situ- 
ations occur in locate-and-retrieve tasks, planetary exploration, and many other robot 
tasks. 


How can one devise an algorithm for action selection that can cope with this type of 
uncertainty? One might be tempted to solve the problem of what to do by analyzing 
each possible situation that might be the case under the current state of knowledge. 
In our example, there are two such cases: the case where the goal is on the upper 
left relative to the initial robot heading, and the case where the goal is on the lower 
right. In both these cases, however, the optimal policy does not bring the agent to a 
location where it would be able to disambiguate its pose. That is, the planning prob- 
lem in partially observable environment cannot be solved by considering all possible 
environments and averaging the solution. 


Instead, the key idea is to generate plans in belief space (sometimes referred to infor- 
mation space). The belief space comprises the space of all belief distributions b that 
the robot might encounter. The belief space for our example is shown in Figure ??. 
It reflects what the robot knows about the state of the world. The center diagram cor- 
responds to the case where the robot is ignorant of its heading direction, as indicated 
by the two question marks. As the robot enters any of the locations that reveal where 
there goal is, it will make a transition to one of the two diagrams at the border. Both of 
those correspond to cases where the robot pose is fully known: The robot faces north 
in the left diagram, and it faces south in the right one. Since the a priori chance of each 
location is the same, the robot will experience a random transition with a 5096 chance 
of ending up in either state of knowledge. 


The belief state is rich enough to solve the planning problem. In our toy example, the 
number of different belief states happens to be finite. This is usually not the case. In 
worlds with finitely many states the belief space is usually continuous, but of finite 
dimensionality. If the state space is continuous, the belief space is usually infinitely- 
dimensional. 


This example illustrates a fundamental property that arises from the robot's inability 
to sense the state of the world—one whose importance for robotics has often been 
under-appreciated. In particular, in uncertain worlds a robot planning algorithm must 
consider the state of knowledge (the belief state). In general it does not suffice to 
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consider the most likely state only. By conditioning the action on the belief state— 
as opposed to the most likely actual state—the robot can actively pursue information 
gathering. In fact, the optimal plan in belief state optimally gathers information, in that 
it only seeks new information to the extent that it is actually beneficial to the expected 
utility of the robot's action. This is a key advantage of the probabilistic approach to 
robotics. However, it comes at the price of an increased complexity of the planning 
problem. 


15.3 VALUE ITERATION 


We will now describe a first algorithm for action selection under uncertainty. The 
algorithm is a version of various flooding-type algorithms that recursively calculate 
the utility of each action relative to a payoff function. Value iteration, as discussed 
in this section, addresses only the first type of uncertainty: It devises control policies 
that can cope with the stochasticity of the physical world. It does not address the 
uncertainty arising from perceptual limitations, Instead, we will assume that the state 
of the world is fully observable at any point in time. 


15.3.1 Goals and Payoff 


Before describing a concrete algorithm, let us first define the problem in more concise 
terms. In general, robotic action selection is driven by goals. Goals might correspond 
to specific configurations (e.g., a part has been picked up by a robot arm), or they 
might express conditions over longer periods of time (e.g., a robot balances a pole). 
Some action selection algorithms carry explicit notions of goals; others use follow 
implicit goals. The algorithms discussed in this book all use explicit descriptions of 
goals. This enables them to pursue different goals. In contrast, control algorithm with 
implicit goals are usually unable to generate actions for more than just one goal. 


In robotics, one is often concerned with reaching specific goal configurations, while 
simultaneously optimizing other variables, often thought of as cost. For example, one 
might be interested in moving the end-effector of a manipulator to a specific location, 
while simultaneously minimizing time, energy consumption, jerk, or the number of 
collisions with surrounding obstacles. At first glance, one might be tempted to express 
these desires by two quantities, one that is being maximized (e.g., the binary flag 
that indicates whether or not a robot reached its goal location), and the other one 
being minimized (e.g., the total energy consumed by the robot). However, both can 
be expressed using a single function called the payoff function (also known as utility, 
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cost). The payoff, denoted c, is a function of the state. For example, a simple payoff 
function is the following: 


c(s) = 


{ +100 if robot reaches goal (15.1) 


—1 otherwise 


This payoff function rewards the robot with --100 if a goal configuration is attained, 
while it penalizes the robot by —1 for each time step where is has not reached that 
configuration. 


Why using a single payoff variable to express both goal achieval and costs? This is 
primarily because of two reasons: First, the notation avoids clutter in the formulae 
yet to come, as our treatment of both quantities will be entirely analogous throughout 
this book. Second, and more importantly, it pays tribute to the fundamental trade- 
off between goal achievement and costs along the way. Since robots are inherently 
uncertain, they cannot know with certainty as to whether a goal configuration has 
been achieved; instead, all one can hope for is to maximize the chances of reaching a 
goal. This trade-off between goal achieval and cost is characterized by questions like 
Is increasing the probability of reaching a goal worth the extra effort (e.g., in energy, 
time)? Treating both goal achieval and costs as a single numerical factor enables us to 
trade off one against the other, hence providing a consistent framework for selecting 
actions under uncertainty. 


We are interested in devising programs that generate actions so as to optimize future 
payoff in expectation. Such programs are usually referred to as control policies, de- 
noted 7: 


т: 00-0 —, дд (15.2) 


А policy 7 is a function that maps date into actions. Taking as general a view as 
possible, a policy might be an elaborate planning algorithm, or it might be a fast, 
reactive algorithm that bases its decision on the most recent data item only. The policy 
7 may deterministic or non-deterministic, and it might only be partially defined in the 
space of all data sets GO), 


An interesting concept in the context of creating control policies is the planning hori- 
zon. Sometimes, it suffices to choose an action so as to maximize the immediate next 
payoff value. Most of the time, however, an action might not pay off immediately. For 
example, a robot moving to a goal location will receive the final payoff for reaching 
its goal only after the very last action. Thus, payoff might be delayed. An appropriate 
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objective is then to choose action so that the sum of all future payoff is maximal. We 
will call this sum the cumulative payoff. Since the world is non-deterministic, the best 
one can optimize is the expected cumulative payoff, which is conveniently written as 


T 
OF = ЕУ a ous (15.3) 
T—l 


Here the expectation E| | is taken over future momentary payoff values c++, that the 
robot might accrue between time t and time t + Т. The individual payoffs c,,, are 
multiplied by an exponential factor у”, called the discount factor, which is constrained 
to lie in the interval [0; 1]. If y = 1, we have у” = 1 for arbitrary values of т, and 
hence the factor can be omitted in Equation (15.3). Smaller values of y discount 
future payoff exponentially, making earlier payoffs exponentially more important than 
later ones. This discount factor, whose importance will be discussed below, bears 
resemblance to the value of money, which also looses value over time exponentially 
due to inflation. 


We notice that CT is a sum of T time steps. T is called the planning horizon, or 
simply: horizon. We distinguish three important cases: 


1. T = 1. This is the greedy case, where the robot only seeks to minimize the im- 
mediate next payoff. While this approach is degenerate in that it does not capture 
the effect of actions beyond the immediate next time step, it nevertheless plays 
an important role in practice. The reason for its importance stems from the fact 
that greedy optimization is much simpler than multi-step optimization. In many 
robotics problems, greedy solutions are currently the best known solutions that 
can be computed in reasonable time. Obviously, greedy optimization is invariant 
with respect to the discount factor ~y, as long as y > 0. 


2. T'is finite (but larger than 1). This case is known as the finite-horizon case. Typi- 
cally, the payoff is not discounted over time, that is, гу = 1. One might argue that 
the finite-horizon case is the only one that matters, since for all practical purposes 
time is finite. However, finite-horizon optimality is often harder to achieve than 
optimality in the discounted infinite-horizon case. Why is this so? A first insight 
stems from the observation that the optimal action is a function of time horizon. 
Near the far end of the time horizon, for example, the optimal action might differ 
substantially from the optimal action earlier in time, even under otherwise iden- 
tical conditions (e.g., same state, same belief). As a result, planning algorithms 
with finite horizon are forced to maintain different plans for different horizons, 
which can add undesired complexity. 
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3. T is infinite. This case is known as the infinite-horizon case. This case does not 
suffer the same problem as the finite horizon case, as the number of remaining 
time steps is the same for any point in time (it's infinite!) However, here the 
discount factor ^y is essential. To see why, let us consider the case where we have 
two robot control programs, one that earns us $1 per hour, and another one that 
makes us $100 per hour. In the finite horizon case, the latter is clearly preferable 
to the former. No matter what the value of the horizon is, the expected cumulative 
payoff of the second program exceeds that of the first by a factor of a hundred. 
Not so in the infinite horizon case. Without discounting, both programs will earn 
us an infinite amount of money, rendering the expected cumulative payoff СТ 
insufficient to select the better program. 


Under the assumption that each individual c is bounded in magnitude (that is, 
|| < Cmax for some value Cmax), discounting guarantees that C^? is finite — 
despite the fact that the sum has infinitely many terms. In our example, we have 


C? = c+ 7e+7%c+7%eH+... (15.4) 
ES (15.5) 
Lew 


where c is either $1, or $100. Using, for example, a discount factor of y = 0.99, 
we find that our first program gives us a discounted return of C — $100, whereas 
the second results in C = $10,000. More generally, C is finite 7 as long as it 
is smaller than 1. An popular alternative to discounting involve maximizing the 
average payoff instead of the total payoff. Algorithms for maximizing average 
payoff will not be studied in this book. 


We will now introduce a few, useful variations on the basic notation. In particular, 
sometimes we would like to refer to the cumulative payoff CT conditioned on the 
state at time t being s. This will be written as follows: 


CT(s = E 


TH 
т=1 


The cumulative payoff CT is a function the robot's policy for action selection. Some- 
times, it is beneficial to make this dependence explicit: 


СТ = E 


T 
У 9 635g = set) (15.7) 


T—l 
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This notation enables us to compare two control policies л and 7’, and determine 
which one is better. Simply compare CT to CT, and pick the algorithm with higher 
expected discounted future payoff. 


Finally, we notice that the expected cumulative payoff is often referred to as value, to 
contrast them with the immediate state payoff denoted c. 


15.3.2 Finding Control Policies in Fully 
Observable Domains 


Traditionally, the robotics planning literature has investigated the planning problem 
predominately in deterministic and fully observable worlds. Here the assumption is 
that P(s' | a, s) and P(o | s) both are point mass distributions—which is a fancy 
way of saying that state transitions and observations are deterministic. Moreover, the 
measurement function P(o | s) is usually assumed to be bijective, which implies 
that the state s can be determined from the observation o. In such cases, a perfectly 
acceptable plan is a fixed sequence of actions that does not involve sensing during plan 
execution. Naturally, this case plays no role in the probabilistic setting, as it does not 
accommodate the inherent uncertainty in robotics. 


More realistic is the case where P(s’ | а, ѕ) is stochastic, but the state s is fully 
observable. Two reasons make it worthwhile to study this special case in some depth. 
First, it encompasses some of the uncertainty in robotics, making sensing an integral 
part of plan execution. Second, it prepares us for the more general case of partial 
observability in that it allows us to introduce the idea of value iteration. The framework 
for action selection in stochastic environments with fully observable state is known as 
Markov decision processes, abbreviated as MDPs. 


The problem that arises in MDPs is that of determining a policy 7 that maximizes the 
expected future payoff CT. Since the state is fully observable, it suffices to condition 
the policy on the current state s, instead of the entire data history as in (15.2). Thus, 
we assume the policy is of the form 


T:8—4dà (15.8) 


Why the policy can be conditioned on the most recent state only should be immedi- 
ately obvious. Formally, this follows directly from the Markov assumption defined in 
Chapter 2.4.4 of this book. 
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Let us now derive a basic algorithm for constructing such a policy known as value iter- 
ation. Let us begin with defining the optimal policy for a planning horizon of T' — 1, 
that is, we are only interested in a policy that maximizes the immediate next pay- 
off. The optimal policy is denoted v! (s) and is obtained by maximizing the expected 
payoff over all actions a: 


т!(з) = argmax f(s!) P(s'|a, s) ds’ (15.9) 


Thus, an optimal action is that that maximizes the immediate next payoff in expecta- 
tion, and the policy that chooses such an action is optimal. 


Every policy has an associated value function, which measures the expected value 
(cumulative discounted future payoff) of this specific policy. For 71, the value function 
is simply the expected immediate payoff, discounted by the factor y: 


C. = max | о) P(s'|a, s) ds (15.10) 


The definition of the optimal policy with horizon 1 and the corresponding value func- 
tion enables us to determine the optimal policy for the planning horizon T' — 2. In 
particular, the optimal policy for T' — 2 selects the action that maximizes the one-step 
optimal value C (s): 


n(s) = argmax f les?) + C! (s^)] P(s' |a, s) ds’ (15.11) 


It should be immediately obvious why this policy is optimal. The value of this policy 
conditioned on the state s is given by the following discounted expression: 


C(s) = max fles’) 4-C!(s)] P(s' |a, s) ds’ (15.12) 


Notice that the optimal policy and its value function for T' — 2 was constructed re- 
cursively, from the optimal value function for T' — 1. This observation suggests that 
for any finite horizon T' the optimal policy, and its value function, can be obtained 
recursively from the optimal policy and value function T' — 1. 
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This is indeed the case. We can recursively construct the optimal policy and the cor- 
responding value function for the planning horizon T from the optimal value function 
for T' — 1, via the following equation: 


a7(s) = argmax fles’) + CT-1(s')] P(s'|a, s) ds’ (15.13) 


The resulting policy 77 (s) is optimal for the planning horizon Т. Analogously, its 
value function is defined through the following recursion: 


a 


CT(s) = max flets’) + CT-1(s*)] P(s'|a, s) ds’ (15.14) 


Equations (15.13) and (15.14) give us a recursive definition of the optimal policy and 
optimal value function for any finite horizon T'. 


In the infinite horizon case, the optimal value function reaches an an equilibrium: 
С° (8) = max flets’) + C?* (s^)] P(s'|a, s) ds’ (15.15) 


This invariance is known as Bellman equation. Without proof, we notice that every 
value function C which satisfies the following recursion: 


C(s) — max f (es!) + C(s’)] P(s'|a, s) ds’ (15.16) 
is optimal, in the sense that the policy that is greedy with respect to C 
т(ѕ) = argmax fles’) + C(s')] P(s'|a, s) ds’ (15.17) 


maximizes the infinite-horizon payoff. 


15.3.3 Value Iteration 


This consideration leads to the definition of value iteration, a popular and decades-old 
algorithm for calculating the optimal infinite-horizon value function. Value iteration is 
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a practical algorithm for computing the optimal policy fully observable problems with 
finite state and action spaces. It does this by successively approximating the optimal 
value functions, as defined in (15.16). 


Let us denote the approximation by Ó Initially, the approximation is set to zero (or, 
alternatively, some large negative value): 


С — 0 (15.18) 


Value iteration then successively updates ће approximation via the following recur- 
sive rule, which computes the value of a state s from the best expected value one time 
step later: 


Q(s) — max fles’) + C(s')| P(s'|a, s) ds! (15.19) 


Notice that the value iteration rule bears close resemblance to the calculation of the 
horizon-T' optimal policy above. Value iteration converges if y < 1 and, in some 
special cases, even for y = 1. The order in which states are updated in value iteration 
is irrelevant, as long as each state is updated infinitely often. In practice, convergence 
observed after much smaller number of iterations. 


At any point in time, the value function e (s) defines the policy of greedily maximizing 


C (s): 
Ts) = argmax fles’) + C(s')| P(s'|a, s) ds’ (15.20) 


After convergence of value iteration, the policy that is greedy with respect to the final 
value function is optimal. 


Table 15.1 outlines the basic value iteration algorithm MDP value iteration. The 
value function is initialized in lines 2 and 3. Line 4 through 6 implement the recursive 
calculation of the value function. If the state space is discrete, the integral is replaced 
by a finite sum. Once value iteration converges, the resulting value function C induces 
the optimal policy. 
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1: Algorithm MDP value iteration(): 
2: for all s do 

Ó(s) = 0 
4: repeat until convergence 

for all s 


C(s) = max, f[c(s^) + Ó(s')] P(s'|a, s); ds! 


EE return C 


Table 15.1 The value iteration algorithm for MDPs with fi nite state and action spaces. 


15.3.4 Illustration 


Figure 15.2 illustrates value iteration in the context of a robotic path planning prob- 
lem. Shown there is a a two-dimensional projection of a configuration space of a 
circular robot. The configuration space is the space of all (x, y, Ө) coordinates that the 
robot can physically attain. For circular robots, the configuration space is obtained 
by 'growing' the obstacles in the map by the radius of the robot. These increased 
obstacles shown in black in Figure 15.2. 


The value function is shown in grey, where the brighter a location, the higher its value. 
Following the color gradient leads to the respective goal location, as indicated by the 
path shown in Figure 15.2. The key observation is that the value function is defined 
over the entire state space, enabling the robot to select an action no matter where it is. 
This is important in non-deterministic worlds, where actions have stochastic effects 
on the robot's state. 


The path planner that generated Figure 15.2 makes specific assumptions in order to 
keep the computational load manageable. For circular robots that can turn on the spot, 
it is common to compute the value function in x-y-space only, basically ignoring the 
cost of rotation. It is also quite common to ignore the robot's velocity, despite the 
fact that it clearly constrains where a robot can move. Obviously, path planners that 
plan in x-y-space are unable to factor the cost of rotation into the value function, and 
they cannot deal with robot dynamics (velocities). It is therefore common practice 
to combine such path planners with fast, reactive collision avoidance modules that 


436 CHAPTER 15 


(a) (b) 


Figure 15.2 Example of value iteration over state spaces in robot motion. Obstacles are 
shown in black. The value function is indicated by the grayly shaded area. Greedy action 
selection with respect to the value function lead to optimal control, assuming that the robot's 
pose is observable. Also shown in the diagrams are example paths obtained by following 
the greedy policy. 


generate motor velocities while obeying dynamic constraints. А path planner which 
considers the full robot state would have to plan in at least five dimensions, comprising 
the full pose (three dimensions), the translational and the rotational velocity of the 
robot. In two dimensions, calculating the value function for environment like the one 
above takes only a fraction of a second on a low-end PC. 


16 


PARTIALLY OBSERVABLE 
MARKOV DECISION PROCESSES 


16.1 MOTIVATION 


Let us now shift focus to the partially observable problem. The algorithms discussed 
thus far only address uncertainty in action effects, but they assume that the state of 
the world can be determined with certainty. For fully observable Markov decision 
processes, we devised a value iteration algoithm for controlling robots in stochastic 
domains. We will now be interested in the more general case where the state is not 
fully observable. Lack of observability means that the robot can only estimate a pos- 
terior distribution over possible world state, which we refer to as the belief b. This 
setting is known as Partially Observable Markov Decision Processes, or POMDPs. 


The central question addressed in the remainder of this chapter is the following: Can 
we devise planning and control algorithms for POMDPs? The answer is positive, but 
with caveats. Algorithms for finding the optimal policy only exist for finite worlds, 
where the state space, the action space, the space of observations, and the planmning 
horizon T are all finite. Unfortunately, these exact methods are computationally ex- 
tremely complex. For the more interesting continous case, the best known algorihms 
are all approximative. 


АП algorithms studied on this chapter build on the value iteration approach discussed 


previously. Let us restate Equation (15.14), which is the central update equation in 
value iteration in MDPs: 


CT(s) = max flets’) + CT-1(s')] P(s'|a, s) ds’ (16.1) 


a 


437 


438 CHAPTER 16 


In POMDPs, we apply the very same idea. Howewver, the state s it not observable. 
All we have is the belief state b, which is a posterior distributon over states. POMDPs, 
thus, compute a value function over belief spaces: 


CT(b) = max fiw) + CT-1(b')] P(b'|a, b) d (16.2) 
and use this value function to generate control: 
тТ(Ь) = argmax / [c(b’) + CT-1(0')] Р(Б |а, b) db! (16.3) 


Unfortunately, calculating value functions is more complicated in belief space than it 
is in state space. A belief is a probability distribution; thus, values C7 in POMDPs 
are functions over probability distributions. This is problematic. If the state space is 
finite, the belief space is continuous, since it is the space of all distributions over the 
state space. The situation is even more delicate for continuous state spaces, where the 
belief space is an infinitely-dimensional continuum. Furthermore, Equations (16.2) 
and (16.3) integrate over all beliefs b’. Given the complex nature of the belief space, 
it is not at all obvious that the integration can be carried out exactly, or that effective 
approximations can be found. 


Luckily, exact solutions exist for the interesting special case of finite worlds... 


This chapter is organized as follows. 


m Section 16.2 discusses a more general planning algorithm that addresses both 
types of uncertainty: Uncertainty in action and uncertainty in perception. This 
algorithm adopts the idea of value iteration, but applies it to a belief space rep- 
resentation. The framework underlying this algorithm is called Partially Observ- 
able Markov Decision Processes (POMDPs). The POMDP algorithm can antici- 
pate uncertainty, actively gather information, and explore optimally in pursuit of 
an arbitrary performance goal, at the expense of increased computational com- 
plexity. 


m Finally, Section 16.5 discusses augmented MDP algorithms, which are cross- 
overs between MDP and POMDP algorithms. Augmented MDP algorithms con- 
sider uncertainty, but abstract away detail and hence are computationally much 
more efficient than POMDP solutions. 
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16.2 FINITE ENVIRONMENTS 


We begin our consideration with the important special case of finite worlds. Here we 
assume that we have a finite state space, finitely many actions a at each state, a finite 
number of different observations o, and a finite planning horizon T'. Under these con- 
ditions, the optimal value function can be calculated exactly, as can the optimal policy. 
This is not at all obvious. Even if the state space is finite, there are in fact infinitely 
many possible beliefs. However, in this and the following section we establish a well- 
known result that the optimal value function is convex and composed of finitely many 
linear pieces. Thus, even though the value function is defined over a continuum, it can 
be represented on a digital computer—up to the accuracy of floating point numbers. 


16.2.1 An Illustrative Example 


An example of a finite world is shown in Figure 16.1. This specific example is ex- 
tremely simple and artificial, and its sole role is to familiarize the reader with the key 
issues of POMDPs before discussion the general solution. We notice that the world in 
Figure 16.1 possesses four states, labeled sı through s4, two actions, a; and а», and 
two observations, labeled 0; and о». The initial state is drawn at random from the two 
top states, s; and s». The robot is now given a choice: executing action ау, which will 
with high probability (but not always) teleport it to the respective other state. Alterna- 
tively, it can execute action a2, which results in a transition in one of the two bottom 
states, s3 or s4 with the probabilities as indicated. In 53 the robot will receive a large 
positive payoff, whereas in s4 the payoff is negative. Both of those states are terminal 
states, that is, once entered the task is over. Thus, the robot’s goal is to execute action 
à» when in state sı. If the robot knew what state it was in, performing the task would 
be extremely easy: Simply apply action a, until the state is 51, then apply action а». 
However, the robot does not know its state. Instead, it can perceive the two observa- 
tions оу and оз. Unfortunately, both observations are possible at both of these states, 
though the probability of observing one versus the other is different, depending on 
what state the robot is in. Since the robot does not know whether its initial state is sı 
Or 55, it must carefully keep track of past observations to calculate its belief. So the 
key policy questions are: When should the robot try action a2? How confident must 
it be, and how long will it take to try this action? These and other questions will be 
answered further below. 


Our goal in this section is to develop an algorithm for computing the optimal value 
function exactly for finite worlds with finite horizon T. Let us denote the planning 
horizon by T and the states by 5,..., Sn. Exploiting the finiteness of the state space, 
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observations: ol .7 in 51, 0.4 in 52 
observations: o2 .3 in 51, 0.6 in 52 


Figure 16.1 Finite State Environment, used to illustrate value iteration in belief space. 


we notice that the belief b(s) is given by n probabilities 


pi = Ыѕ= si) (16.4) 
with ¿ = 1,...,n. The belief must satisfy the conditions 
p 2 0 
Уу = d (16.5) 
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Because of the last condition, b(s) can be specified by n — 1 parameters pi, . .., Pn—1 
instead of n parameters. The remaining probability pn can be calculated as follows: 


P. = 1- Ур (16.6) 
t=1 


Thus, if the state space is finite and of size n, a belief is a (n — 1)-dimensional vector. 


In our example shown in Figure 16.1, it might appear that the belief is specified by 
three numerical probability values, since there are four states. However, the action 
аә separates (with certainty) the states {51,52} from the states (53, 54]. Thus, the 
only uncertainty that the robot may encounter is a confusion between states s; and 
вз, and a confusion between states 5з and s4. Both can be represented by a single 
numerical probability value, thus, the only continuous component of the belief state is 
one-dimensional. This makes this example convenient for plotting value functions. 


Let us now focus on the question as to whether we can calculate the optimal value 
function and the optimal policy exactly in finite domains. If not, we might be forced to 
approximate it. At first, one might conclude that calculating the optimal value function 
is impossible, due to the fact that the belief space is continuous. However, we observe 
that for finite worlds, the value function has a special shape: It is composed of finitely 
many linear pieces. This makes it possible to calculate the optimal value function 
in finite time. We also notice that the value function is convex and continuous—these 
latter two properties also apply to optimal value functions over continuous state spaces 
and infinite planning horizons. 


We begin our consideration with the immediate payoff of a belief state b. Recall 


that the payoff of b is given by the expectation of the payoff c under the probability 
distribution b: 


cb) = fas b(s) ds (16.7) 


Using the fact that b is uniquely specified by the probabilities p1, . . . , Pn, we can write 
c(d) = elpi,--.,Pn) = Ses) pi (16.8) 


i=l 


which is indeed linear in p1,... , Pn- 
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Figure 16.2 Expected payoff c(b) as a function of the belief parameter рз, assuming that 


the robot is either in state s3 or s4. 


It is interesting to plot the function c(b) for belief distributions over the two states s3 
and s4 in our example, the only states with non-zero payoff. The payoff in state 53 
is 100, whereas the payoff in state s4 is —100. Figure 16.2a shows the function c(b) 
for the subspace of the belief defined by (0,0, p3, 1 — рз) which is a belief space that 
places all probability in the states 5з and s4. Here the expectation c(b) is plotted as a 
function of the probability p3. Obviously, if рз = 0, the environment state is s4, and 
the payoff will be c(s4) — —100. On the other hand, if p3 — 1, the environment's 
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state is s3 and the payoff will be c(s3) = 100. In between, the expectation is linear, 
leading to the graph shown in Figure 16.2a. 


This consideration enables us to calculate the value function for planning horizon 
T = 1. From now on, we will only consider the subset of the belief space that places 
all probability on the two states s; and s2. This belief space is parameterized by a 
single parameter, pı, since pọ = 1 — pı and рз = ра = 0. The value function 
C1(b, аз) is constant zero for action ay: 


Cl(ba) = 0 (16.9) 


since whatever the true state of the robot, action a, will not lead it to a state that makes 
it receive non-zero payoff. This value function C1(b, ау) if graphed in Figure 16.2b. 
The picture becomes more interesting for action a». If the state of the environment is 
sı, this action will lead with 90% chance to state s3, where the robot will receive a 
payoff of 100. With 1096 probability it will end up in state s4, where its payoff will 
be —100. Thus, the expected payoff in state s; is 0.9 - 100 + 0.1 - (—100) = 80. 
By an analogous argument, the expected payoff in state s2 is —80. In between, the 
expectation is linear, yielding the value function 


С'(Ь, аз) = (80р: – 80р) = 72pi – 72р» (16.10) 


Here we use the discount factor у = 0.9. This function is shown in Figure 16.2c, for 
beliefs of the type (pı, 1 — pı, 0,0). 


So what is the right action selection policy? Following the rationale of maximizing ex- 
pected payoff, the best action depends on our current belief, assuming that it accurate 
reflects our knowledge about the real world. If the probability pı > 0.5, the optimal 
action will be a5, since we expect positive payoff. For values smaller than 0.5, the 
optimal action will be ај, since it avoids the negative expected payoff associated with 
action a». The corresponding value function 15 the maximum of the action-specific 
value functions: 


CP = max C" (b, a) 
= max{0; 72р — 72p2} (16.11) 


This value function and the corresponding policy for action selection is illustrated by 
the solid graph in Figure 16.2d, which maximizes the two linear components indicated 
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by the dashed lines. We notice that this value function is not linear any longer. Instead, 
it is plecewise linear and convex. The non-linearity arises from the fact that different 
actions are optimal for different parts of the belief space. 


From the value function in Figure 16.2d, can we conclude that for beliefs p; « 0.5 
there is no way to reap payoffs larger than 0? Of course, the answer is no. The 
value function C1(b) is only optimal for the horizon Т = 1. For larger horizons, it 
is possible to first execute action ат, followed by action a». Executing action a, has 
two beneficial effects: First, it helps us estimate the current state better due to the fact 
that the we can sense, and second, with high probability it changes the state from sı 
to sg or vice versa. Thus a good policy might be to execute action a, until we are 
reasonably confident that the state of the robot is sı. Then, the robot should execute 
action a». 


To make this more formal, let us derive the optimal value function for horizon T' — 2. 
Suppose we execute action ау. Then two things can happen: We either observe o, or 
02. Let us first assume we observe o1. Then the new belief state can be computed 
using the Bayes filter: 


2 
№ = m Р(о\|\) V; P(silar, si) pi 
i—l 


= m 0.7(0.1р + 0.8p2) 
nı (0.07p; + 0.56p2) (16.12) 


where 77; is the normalizer in Bayes rule. Similarly, we obtain for the posterior proba- 
bility of being in s2: 


2 


p, = m Р(о\|%) V ^ P(sdlar, si) pi 
1=1 


1 0.4(0.9p1 + 0.2p2) 
= n (0.36p1 + 0.08p2) (16.13) 


Since we know that ру + p = 1—after all, when executing action a; the state will 
either be s; or s9—we obtain for nı: 


1 1 
=. == == 16.14 
0.07р + 0.56p2 + 0.36р + 0.08p» 0.43p; + 0.64p» ( ) 


71 
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We also notice that the variable 7; is a useful probability: It is the probability of 
observing o after executing action ау (regardless of the posterior state). 


We now have expression characterizing the posterior belief given that we execute ac- 
tion a, and observe оу. So what is the value of this belief state? The answer is ob- 
tained by plugging the new belief into the value function C'! (b) as defined in Equation 
(16.11), and discounting the result by y: 


C7(b,a1,01) = y max{0; 72p — 72р} 


= 0.9 max{0; 72 т (0.36p; + 0.08p2 — 0.43pi — 0.64p2)} 
0.9 max(0 ; 72 т (—0.07p1 — 0.56p2) } 


= 0.9 max(0 ; т (—5.04p, — 40.32p3)] 
We now move the factor 7; out of the maximization and move 0.9 inside, and obtain: 
C? (b, a1, 01) = т max{0; —4.563p; — 36.288p5] (16.16) 


which is a piecewise linear, convex function in the parameters of the belief b. 


The derivation for observing o» after executing action a, is completely analogous. In 
particular, we obtain the posterior 


12 0.3(0.1p1 = 0.8рә) = T] (0.03p1 s 0.24рә) 
n2 0.6(0.9p1 + 0.2p2) = т (0.54p + 0.12p3) (16.17) 


p, 
Po 


with the normalizer 


1 
TIU RN ER 16.18 
s 0.57р + 0.36рә ge’) 


The corresponding value is 


C7(b,a1,01) = y max{0; 72p, — 72p5) 
= ny max{0; —33.048p, + 7.776p2} (16.19) 


0.9 max(0 ; 72 т (0.36р + 0.08p2) — 72174 (0.43p4 + 0. 


б4р» ) } 


(16.15) 
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As an aside, we also notice that 7; 1+ Na 1 — 1. This follows directly from the fact 
that the normalizer in Bayes filters is the observation probability 


1 
; = EN 16.2 
"^ = "Pla, eru 


and the fact that there are exactly two possible observations in our example, оу and o». 


Let us now calculate the value C?(b, a1), which is the expected value upon executing 
action а. Clearly, the value is a mixture of the values C? (b, a1, 01) and C? (b, a1, 02), 
weighted by the probabilities of actually observing o, and o», respectively. Put into 
mathematical notation, we have 


2 
C?^(ba)) = M;C"(bai o;)P(oi]ai, b) (16.21) 


i=l 


The terms C?(b, аз, о) were already defined above. The probability P(o;|a1, b) of 
observing o; after executing action ау is 7; 1. Thus, we have all the ingredients for 
calculating the desired value C?(b, a1): 


C7(b,a1) = no! т max(0; —4.563p, — 36.288p2} 
+15: по max(0 ; —33.048pi + 7.776p2} (16.22) 
= max{0; —4.563p, — 36.288p2} + max(0 ; —33.048pi + 7.776p2} 


This expression can be re-expressed as the maximum of four linear functions: 


C2(b,a,) = max{0; —4.563p, — 36.288p» ; (16.23) 
—33.048р + 7.776p2 ; —37.611р — 28.512p3] 


Figure 16.2e shows those four linear functions for the belief space (p1, 1 — p», 0,0). 
The value function C?(b, a1) is the maximum of those four linear functions. As is 
easy to be seen, two of these functions are sufficient to define the maximum; the other 
two are smaller over the entire spectrum. This enables us to rewrite C?(b, a1) as the 
maximum of only two terms, instead of four: 


С?(Ь,ај) = max{0; —33.048p, + 7. 776p) (16.24) 
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Finally, we have to determine the value C? (b), which is the maximum of the following 
two terms: 


C*(b) = max{C°(b, a1), С?(Ь,аз)} (16.25) 


As is easily verified, the second term in the maximization, C? (b, a2), is exactly the 
same as above for planning horizon T' — 1: 


C7(bja2) = QOl(b,ag) = 72p, — 72р» (16.26) 
Hence we obtain from (16.24) and (16.26): 
С?(Ь) = max{0; —33.048p, + 7.776p2 ; 72р — 72p2} (16.27) 


This function is the optimal value function for planning horizon T. Figure ?? graphs 
C? (b) and its components for the belief subspace (p1,1 — p2,0,0). As is easy to be 
seen, the function is piecewise linear and convex. In particular, it consists of three 
linear pieces. For beliefs under the two leftmost pieces (1.е., ру < 0.5), a4 is the 
optimal action. For p, — 0.5, both actions are equally good. Beliefs that correspond 
to the rightmost linear piece, that is, beliefs with p; > 0.5, have the optimal action 
ag. If а is the optimal function, the next action depends on the initial point in belief 
space and on the observation. 


The value function C?(b) is only optimal for the horizon 2. However, our analysis 
illustrates several important points. 


First, the optimal value function for any finite horizon is continuous, piecewise linear, 
and convex. Each linear piece corresponds to a different action choice at some point 
in the future, or to a different observation that can be made. The convexity of the 
value function indicates the rather intuitive observation, namely that knowing is al- 
ways superior to not knowing. Given two belief states b and b’, the mixed value of the 
belief states is larger or equal to the value of the mixed belief state, for some mixing 
parameter 8 with 0 < 8 < 1: 


8C(b--(1-8)C(F) > С(8ь+ (1— B)V) (16.28) 


Second, the number of linear pieces can grow tremendously, specifically if one does 
not pay attention to linear functions becoming obsolete. In our toy example, two 
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out of four linear constraints defining C?(b, a1) were not needed. The ‘trick’ of ef- 
ficiently implementing POMDPs lies in identifying obsolete linear functions as early 
as possible, so that no computation is wasted when calculating the value function. 
Unfortunately, even if we carefully eliminate all unneeded linear constraints, the num- 
ber of linear functions can still grow extremely rapidly. This poses intrinsic scaling 
limitations on the exact value iteration solution for finite POMDPs. 


16.2.2 Value Iteration in Belief Space 


The previous section showed, by example, how to calculate value functions in finite 
worlds. Let us now return to the general problem of value iteration in belief space. In 
particular, in the introduction to this chapter we stated the basic update equation for 
the value function, which we briefly restate here: 


СТ) = max |. [c(b’) + СТ-1(Ь/)| Р(Ы|а,Ь) d (16.29) 


а 


In this and the following sections, we will develop a general algorithm for value it- 
eration in belief space that can be implemented on a digital computer. We begin by 
noticing that Equation (16.2) suggests an integration over all beliefs, where each belief 
is a probability distribution. If the state space is finite and the number of states is n, 
the space of all probability distributions is a continuum with dimension n — 1. To see, 
we notice that n — 1 numerical values are required to specify a probability distribution 
over n discrete events (the n-th parameter can be omitted since probabilities add up to 
1). If the state space is continuous, the belief space possesses infinitely many dimen- 
sions. Thus, integrating over all belief appears to be computationally daunting a task. 
However, we can avoid this integration by reformulating the problem and integrating 
over observations instead. 


Let us examine the conditional probability P('|a, b), which specifies a distribution 
over posterior beliefs b’ given a belief b and an action a. If only b and a are known, the 
posterior belief b’ is not unique, and P(b'|a, b) is a true probability distribution over 
beliefs. However, if if we also knew the measurement o’ after executing action a, the 
posterior b’ is unique and P(b'|a, b) is a degenerate point-mass distribution. Why is 
this so? The answer is provided by the Bayes filter. From the belief b before action 
execution, the action a, and the subsequent observation o', the Bayes filter calculates 
a single, posterior belief b’ which is the single, correct belief. Thus, we conclude that 
if only we knew o', the integration over all beliefs in (16.2) would be obsolete. 
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This insight can be exploited by re-expressing 
P(t|a,0) = (КО? о) P(o'|a, b) do! (16.30) 


where P(V/|a, b, o') is a point-mass distribution focussed on the single belief calcu- 
lated by the Bayes filer. Plugging this integral into the definition of value iteration 
(16.2), we obtain 


СТО) = шах / / [c(5^) + С7106)] P(U |a, b, o") db P(o' |a, b) 1916.31) 
The inner integral 
/ [с(Ь) + CT 1 (0)] Р(Ы|а,Ь,о') db! (16.32) 


contains only one non-zero term. This is the term where b’ is the distribution calculated 
from b, a, and o' using Bayes filters. Let us call this distribution D (b, a, o'), that is, 


B(ba,o)(s) = P(so'a,b) 
Р(0'|8', a, b) P(s'|a, b) 
P(o'|a, b) 


= POD P(o'|s^) J Ре) P(s|a, b) ds 


= ———— Р(о'|5”) J Рен) b(s) ds (16.33) 
a, b) 


The reader should recognize the familiar Bayes filter derivation that was extensively 
discussed in Chapter 2, this time with the normalizer made explicit. We notice that 
the normalizer, P(o'|a, Б), is a factor in the value update equation (16.31). Hence, 
substituting B(b, a, o") into (16.31) eliminates this term, which leads to the recursive 
description of the value iteration algorithm: 


СТ) = max flea.) + CT! (B(b, a, o"))] P(o'|a, b) do’ (16.34) 
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This form is more convenient than the one in (16.2), since it only requires integration 
over all possible measurements o’, instead of all possible belief distributions b’. This 
transformation is a key insight for all value iteration algorithms derived in this chap- 
ter. In fact, it was used implicitly in the example above, where a new value function 
was obtained by mixing together finitely many piecewise linear functions in Equa- 
tion (16.22). 


Below, it will be convenient to split the maximization over actions from the integration. 
Hence, we notice that that (16.34) can be rewritten as the following two equations: 


C (b, a) 


Јова.) + СТ-1(В(Ь,а,0'))| Р(о |а, Б) do’ (16.35) 


СТЫ) = шах CT (b, a) (16.36) 


Here CT (b, a) is the horizon T value function over the belief b assuming that the 
immediate next action is a. 


16.2.3 Calculating the Value Function 


In our example, the optimal value function was piecewise linear and convex. Is this 
specific to this example, or is this a general characteristic of value functions in finite 
worlds? Luckily, it turns out that the latter is the case: АП optimal value functions in 
finite POMDPs with finite horizon are piecewise linear and convex. Piecewise linear- 
ity means that value function CT is represented by a collection of linear functions. If 


CT was linear, it could be represented by set of coefficients CT ,..., CT: 
СТЫ) = СТ(р,...,рә„) = X 0n (16.37) 
i=1 
where as usual, p1,..., Dn are the parameters of a belief distribution. As in our ex- 


ample, a piecewise linear and convex value function CT (b) can be represented by the 
maximum of a collection of K linear functions 


T = go 
CT(p = шах у Cf ps (16.38) 


i=1 
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where CE T JOR n denote the parameters of the k-th linear function, and K de- 
notes the number of linear pieces. The reader should quickly convince herself that the 
maximum of a finite set of linear functions is indeed a convex piecewise linear, convex 
function. 


In value iteration, the initial value function is given by 
С° = 0 (16.39) 


We notice that this value function is linear, hence by definition it is also piecewise 
linear and convex. This assignment establishes the (trivial) baseline of the recursive 
value iteration algorithm. 


We will now derive a recursive equation for calculating the value function С? (b). This 
equation assumes that the value function one time step earlier, C ^! (b), is represented 
by a piecewise linear function as specified above. As part of the derivation, we will 
show that under the assumption that C7 1 (b) is piecewise linear and convex, C7 (b) is 
also piecewise linear and convex. Induction over the planning horizon T' then proves 
that all value functions with finite horizon are indeed piecewise linear and convex. 


Equations (16.35) and (16.36) state the following: 


CT(b,a) = Јова.) + СТ-1(В(Ь,а,0'))| P(o'|a,b) do! (16.40) 
СТ) = max CT (b, a) (16.41) 


In finite spaces, all integrals can be replaced by finite sums, and we obtain: 


CT(ba) = 22 B(b,a,0')) + C11 (B(b,a,o'))| P(o'|a,b) (16.42) 


СТ) = max C" (0, a) (16.43) 


The belief B(b, а, 0’) is obtained using the following expression, which ‘translates’ 
Equation (16.33) to finite spaces by replacing the integral by a sum: 


B(b,a, o )(s^) P(o'|s’) PER (s'|a, s) b (16.44) 


FI 
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If the belief b is represented by the parameters (pi, . . . , Pn }, and the belief B(b, a, o^) 
by {p}, ..., p) the i-th parameter of the belief b’ is computed as follows: 


p = PS Р(0'|8!) P» ‘la, 84) (16.45) 


The reader may recognize the discrete Bayes filter that was already discussed in length 
in Chapter 4.1, where introduced the basic filtering theory. 


To compute the value function CT (b, а), we will now derive more practical expres- 
sions for the terms c(B(b, a, o')) and CT-!(B(b, a, o')), starting with the one in 
(16.42). With our parameterization B(b,a,o') = {p1,..., ph}, we obtain for the 
term c(B(b, a, 0’)): 


с(В(Ь,а,0)) = epum) = Sle, (16.46) 


Substituting (16.45) into this expression gives us 


n 


c(B(b,a,o)) = De ira БОКЕ P(o'|s;) ут (sila, sj) 
i=l 


= PU 7 Yan (o |s]) P» (sila, ву) p; (16.47) 


The latter transformation is legal since the term P(o'|a, b) does not depend on i. This 
form still contains an expression that is difficult to compute: P(o'|a, b). However, the 
beauty of the finite case is that this expression cancels out, as we will see very soon. 
Hence, it does not have to be considered any further at this point. 


The derivation of the term CT-!(B(b,a,o')) in (16.42) is similar to that of 
c(B(b, a, o')) above. In particular, we have to replace the the immediate payoff func- 
tion c by the the value function C71, which is only piecewise linear and convex. This 
gives us 


CT-1(B(ba,o) = OCT (p,...,ph) = «рк etn pP, (1648) 
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for sets of linear parameters function C ESO 1. As above, we now substitute the defini- 
tion of the posterior belief В (see (16. 45)) into this expression, and obtain: 


TL 


= max O oT BULL Р(0 |8!) P» s/|a, s;) (16.49) 


i=1 


The term P(o'|a,b)~! can be moved out of the summation and the maximization, 
since it does not depend on i or k: 


= PULS sc *P(o'|si) У? P(sila, s;) (16.50) 
j=l 


It may not be immediately obvious that this expression is also piecewise linear and 
convex. However, reordering the terms inside the summation gives us the following 
expression: 


~ Р(оа,ъ) TU B3 » [Xn > Chi’ P(o'|s;) Р(в,|а, »)) (16.51) 


=]. 


In this form, it is easy to verify that that for any fixed value of k, the argument of 
the maximization is indeed linear in the parameters p;. The maximum of those linear 
pieces is, thus, piecewise linear and convex. Moreover, the number of linear pieces is 
finite. 


Let us now return to the main problem addressed in this section, namely the problem 
computing CT (b, a), the value function for horizon T. We briefly restate Equation 
(16.42): 


CT(b,a) = 12, le В(Ь, а,0')) + CT (B(b,a,o))] P(o'|a,b) (16.52) 


Let us first calculate the sum c(B(b, а, 0')) + CT -1(B(b, a, о”)), using Equations 
(16.47) and (16.50): 


c(B(b, a, o")) + CT-!(B(b, а,0!)) 
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т 


1 т 
= Pola; À“ P(o'|si) XC P(sila.si) pj 


i=1 j=1 


1 


T—1 ‘or / 
"Puls 5 mex S Ot P(e|s) XO P(sila, s;) р; (16.53) 


i=1 j=1 
With some reordering, we obtain 


c(B(b, a, o")) + QU OB а,0!)) 


zz 1 n 1) А = 
= Paay У (с + СЕГ1)Р(0'|8;) н sja,s;) pj| (16.54) 


i=1 


Substituting this expression into Equation (16.52) gives us: 


P(o|s;) V^ P(sila,s;) pj | + P(o'|a,b) (16.55) 


We notice that the term P(o’|a, Б) appears in enumerator and in the denominator of 
this expression, thus cancels out: 


CT(ba) = quA У (а + Ch )P( (o'|s;) ) $E s;|a, sj) p; (16.56) 
= j=1 


This cancellation is an essential characteristic of the POMDP solution in finite worlds, 
which accounts for a simpler update equation when compared to the general solution 
described in the previous section. The desired value function is then obtained by 
maximizing CT (b, a) over all actions a: 


CT(b = max C" (b, a) (16.57) 
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As is easily verified, C (b) is indeed piecewise linear and convex in the belief pa- 
rameters p1,..., p». In particular, for each fixed k the term inside the large brackets 
in (16.56) is linear. The maximum of those K linear functions is a piecewise linear 
function that consist of at most K linear pieces. The sum of piecewise linear, con- 
vex functions is again piecewise linear and convex. Hence the summation over all 
observations o' produces again a piecewise linear convex function. Finally, the max- 
imization over all actions in Equation (16.57) again results in a piecewise linear and 
convex function. Thus, we conclude that С' (b) is piecewise linear and convex for апу 
finite horizon T'. 


16.2.4 Linear Programming Solution 


How can we turn the mathematical insight into an algorithm that can be carried out 
on a digital computer in finite time? As in our example above, the key insight is that 
solutions to equations like the one above can be calculated using linear programming. 
Linear programming provides solutions for optimization problems under linear con- 
straints. 


Suppose we know that 


C = max (а) (16.58) 


a:l<a<m 


for some fictitious set of values x(a), and some positive integer m. A solution to this 
equation can be obtained by a linear program which possesses a collection of m linear 
constraints, one for each possible value of a: 


Ф = {C> а(а)} (16.59) 


The value of C is then obtained by minimizing С under these constraints, which is a 
standard linear programming problem. Now suppose we are given the slightly more 
complicated expression 


С = max 2(а,1) (16.60) 


for some fictitious values x(a,4) and some positive integer n. How can we obtain 
the solution to this problem by a linear program? Clearly, for each т there might be 
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a different a that maximizes the inner term x(a, 2). The solution to this problem is 
therefore attained for a specific set of values for a, one for each element in the sum. 
Let us for a moment assume we knew those values of a. If we denote this set by 
a(1),...,a(n), we have 


C = > s(a(i),i) (16.61) 
which gives us the single linear constraint 
Ф = (C2MY:x(a(i)i)) (16.62) 


Unfortunately, we do not know the values of a(i) where C attains its maximum. The 
linear programming solution for (16.60) contains therefore one constraint for any such 
sequence a(i): 


U {С > 2: z(a(i),i)) (16.63) 


a(i):1<a(i)<m,1l<i<n 


The total number of constraints is m”, since there are n free variables a(i) which 
each can take on m different values. This constraint set ensures that the maximizing 
constraint(s) is always included. Minimizing C under these constraints then generates 
the solution of (16.60). 


We also notice that it is common practice to divide the set of constraints into active and 
passive constraints, depending on whether they actively constrain the solution for C. 
In this particular example, active constraints are those where the maximum max, x(a) 
is attained; whereas passive constraints correspond to smaller x(a)-values. If we are 
interested in the value of a that maximizes an expression (the ‘argmax’), we simply 
choose an a whose corresponding constraint is active. Active constraint are easily 
identified using any of the existing linear programming algorithms. 


These insights enable us to provide a concrete algorithm for calculating the optimal 
value function and the optimal policy. Initially, for horizon T' = 1, the value function 
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is defined by (16.39), which we briefly restate here: 
QU = + У c(si) pi (16.64) 
i=1 
The corresponding constraint set contains only a single constraint: 
Фф! = I0te ВСС Pi} (16.65) 
i=1 


Minimizing C? under the constrain set Ф! gives the desired solution. 


Let us now consider how to construct the constraint set Ф from the set of constraints 
фТ-1, According to Equations (16.56) апа (16.57), the desired value function is 


TL 


ОСТ) = ymax у шах У Vei CL P(o']s;) 2, P(sila s) v; 
о! J= 


i=l 


(16.66) 


which is similar in shape to the linear constraints discussed above (Equation (16.60)). 
For each action a, we therefore have to generate to following set of constraints: 


U 


[ero <75 
а k(o)d&k(o)&|9T-1| о 


TL 


X (ci + Ciona) Pols) У Pila sj) р; (16.67) 
j=1 


1=1 


which are loj ^ i linear constraints. Here |O| denotes the number of observations, 
and |Ф7—!| is the number of constraint set obtained for horizon Т — 1. The total 
number of constraints is multiplicative in the number of actions |A|, thus our simple 
solution obtains 


ФТ| = JA} o] (16.68) 
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1: Algorithm finite world POMDP(T): 

2 Ф = {Ct < y E; elsi) pi} 

3: For each t from 2 to T 

4: ot = 

5: 1= 0 

6: For each action a 

7: For each k(1),...,k(|O|) with 1 < k(o^) < [Ф 0,1 < o < |O| 
8: CF, = Dig у (ЕС ПРЕ) D8, P(slla, sj) 
9: Ф = GU {(a,C € У) CEPI) 

10: l=l+1 

11: return ФТ 


Table 16.1 The POMDP algorithm for discrete worlds. This algorithm represents the 
optimal value function by a set of constraints, which are calculated recursively. 
constraints for horizon Т, with the boundary condition |Ф!| = 1. 


Table 16.1 depicts the value iteration algorithm for finite POMDP. 


16.3 GENERAL POMDPS 


The previous section derived a value iteration algorithm for POMDPs in finite worlds. 
In particular, this algorithm requires finite numbers of states, observations, and actions, 
and it also requires a finite horizon T. 


[...] 


There are still terms in (16.35) that require further consideration. First, the immediate 
payoff function c has thus far only been defined over states. Here we need to calculate 
it for belief distributions. The payoff of a belief b is simply the expectation of the 
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per-state payoff c(s) under the belief b(s): 


cb) = Pld = fa b(s) ds (16.69) 


It might seem odd that the payoff depends on the robot's belief. However, the robot's 
belief represents the true posterior over world states, which justifies the expectation. 


Second, we need to derive an expression for the probability P(o'|a, b), which is the 
distribution of observations o' one might receive after executing b in a state distributed 
according to b. Deriving this expression is a straightforward mathematical exercise. 
In particular, let us first integrate over the state 5” at which the observation o' is made: 


Po la. by, = EODD s^) P(s'|a, b) ds’ (16.70) 


Thus, s' refers to the state after action execution. As usual, we exploit the Markov 
property to simplify this expression 


Зе / Р(0|5') P(s'la, b) ds" (16.71) 
If we now integrate over states s before action execution: 

д / P(o'|s’) J Pida Pelat dade! (16.72) 
we obtain the simplified term 

г / P(o'|s') i P(s'|a, s) b(s) ds ds! (16.73) 


Armed with the necessary expression, we can now return to the original question of 
how to perform value iteration in belief space and give a concrete algorithm. In partic- 
ular, let us substitute (16.73) back into (16.34), which leads to the following recursive 
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definition of the value function in belief space: 


CT(b = max f (|ва) f (=) B(b, a, 0')(s’) d 
/ P(o'|s^) J P(s'|a, s) b(s) ds as’) do' (16.74) 


This recursive definition defines the optimal value function for any finite horizon T'. 
The optimal value function for infinite horizons is chacterized by Bellman's equation: 


C*(b = max | E / с(8') B(b, a, o')(s) as 
if P(o'|s’) / P(s'|a, s) b(s) ds aj) do! (16.75) 


As in the MDP case, value iteration approximates С'° (b) using a function C by repet- 
itively applying this update equation. The central value iteration algorithm is therefore 
as follows: 


Ĉ(b) — max | ( [б(вф,,о)) + / с(8') B(b,a,o')(s') as! 
/ P(o'|s’) ii P(s'|a, s) b(s) ds as’) do' (16.76) 


This update equation generalizes value iteration to belief spaces. In particular, it pro- 
vides a way to backup values among belief spaces using the familiar motion model 
and the familiar perceptual model. The update equation is quite intuitive: The value 
of a belief b is obtained by maximizing over all actions a. To determine the value of 
executing action a under the belief b, Equation (16.76) suggests to consider all ob- 
servations o'. One such an observation is fixed, the posterior belief is calculated via 
Bayes filtering, and the corresponding value is calculated. Thus, what is missing is a 
calculation of the probability of measuring o'. In Equation (16.76), this probabilist is 
obtained by integrating over all states s, all subsequent states s', and weighing these 
state combinations by the corresponding probabilities under the belief b and the mo- 
tion model P(s'|a, s). 
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1: Algorithm POMDP(7): 

2: for all b do 

3: С) = 0 

4: repeat until convergence 

5: for all b do 

6: C* — —oo 

Ti for all actions a do 

8: Ca = f ([6(8(0,а,0)) + f es’) B( a, 0')(s") ds'| 


fS P(o'|s') f P(s'|a, s) b(s) ds а») do' 


9: if Ca > C* 
10: C* = Ch 
11: C(b) = C* 

12: return C 


Table 16.2 The general POMDP algorithm with fi nite horizon T, where the function 
B(b, a, о!) is as specifi ed in the text. This version leaves open as to how the value function 
Cis represented, and how the integrals in Step 8 are calculated. 


16.3.1 The General POMDP Algorithm 


Table 16.2 depicts a general algorithm for value iteration in belief space. This algo- 
rithm is only an in-principle algorithm that cannot be implemented on a digital com- 
puter, since it requires integration over infinite spaces, and lacks a specification as to 
how the value function C is represented. The algorithm accepts the planning horizon 
T as input, which is assumed to be finite. As in the state-based version of value it- 
eration, the value function is initialized by 0 (lines 2 and 3 in Table 16.2). Lines 6 
through 11 identify the best action for a belief state b relative to the value function 
C. The central update, in line 8, is equivalent to the argument of the maximization in 
Equation (16.76). 
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The most important remaining question concerns the nature of C. To perform value 
iteration in belief space, we need a way to represent a value function over beliefs. 
Beliefs are probability distributions. Thus, we need a method for assigning values 
to probability distributions. This, unfortunately, is a difficult problem that lacks a 
general solution. As noticed above, belief spaces over continuous state spaces pos- 
sess infinitely many dimensions, suggesting that the value function is defined over an 
infinitely-dimensional space. Even for finite state spaces, the value function is defined 
over a continuum, making it questionable as to whether the optimal value function can 
be represented on a digital computer. However, it turns out that the value function can 
be calculated exactly for the important special case of finite problems, in particular, 
for problems with finite state, action, and observation spaces, and with finite planning 
horizon. An exact such algorithm will be give in the next section. 


16.4 A MONTE CARLO 
APPROXIMATION 


Let us now study a concrete algorithm that approximates the general POMDP algo- 
rithm, and that has shown promise in practical applications. The basic idea is to ap- 
proximate the belief state using particles, using the particule filter algorithm for calcu- 
lating beliefs. Particule filters were already extensively discussed in various chapters 
of this book. They were mathematically derived in Chapter 4.2.1. 


16.4.1 Monte Carlo Backups 


For the sake of completeness, let briefly review the basic update equations. Initially, 
N random samples are drawn from the initial belief distribution 00). Then, at time t, 
a new set of weighted particles is generated. 


Table 16.3 shows a variant of the particle filter algorithm that accepts an action a, an 
observation o’, and a belief b as input. It then transforms the belief b into a new belief. 
This algorithm implements the function B(b, a, o"), assuming that belief states are 
represented by weighted sets of particles. It is trivially obtained from the algorithm 
particle filter in Table 4.3 on page 4.3 of this book. See Chapter 4.2.1 for a more 
detailed explanation. 


Armed with B(b,a,o"), the update equation can be implemented using a straight- 
foward Monte Carlo algorihm. Recall that value function in belief space is defined by 


Partially Observable Markov Decision Processes 463 


1: Algorithm particle filter 2(b, a, o"): 

2: =o 

3: do N times: 

4: sample (s, p) from b according to p,,...,pn inb 
5: sample s' ~ P(s'|a, s) 

6: w' = P(o'|s") 

T: add (s', и?) to b' 

8: normalize all weights w' € b' 

9: return b! 


Table 16.3 A variant of the particle fi lter algorithm, which accepts an action a and an 
observation o’ as input. 


the following recusive equation: 


ОТ) = max f E B(b, а, 0')(8') d 
/ P(o'|s^) | P(s'|a, в) b(s) ds aj) do' (16.77) 


Inner projection loop: 


1: Algorithm MCPOMDP inner loop(b): 


CT (b) = –оо 
for all a do 
CT (b, a) = 0 
do N times 
Sample s ~ b 


DX Uu E 


Sample s' ~ P(s'|a, 5) 
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8: Sample o' ~ P(o'|s’) 
Э Ы = particle filter. 2(b, а, 0’) 
10: ОТ (b) = CT (b) + x y[CT (V) + с(5/)] 
11: if CT (b, a) > CT (b) 
12: CT (b) = CT (b, a) 
13: return C? (b) 
Outer McPOMDP loop: 
1: Algorithm MCPOMDP outer loop(b): 
2: for all b do 
3: C(b) = 0 
4: repeat until convergence 
5: for all b 
6: C(b) = MCPOMDP inner loop(b) 
T: return C 
1: Algorithm MCPOMDP alterative outer 100р(0): 
2: for all b do 
3: Ó(b) =0 


i repeat until convergence 
5: b= P(so) 


s ~ P(so) 
T: do until trial over 
8: C(b) = MCPOMDP inner loop(b) 
9: ifrand(0,1) > 0.1 
10: a = argmax, C(b, a) 
11: else 


12: select a at random 
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13: draw s' ~ P(s'|a, s) 
14: Ss = s! 
15: return C 


16.4.1.1 Learning Value Functions 


Following the rich literature on reinforcement learning [16, 38], our approach solves 
the POMDP problem by value iteration in belief space. More specifically, our ap- 
proach recursively learns a value function Q over belief states and action, by backing 
up values from subsequent belief states: 


DL — E [В(оӊа) + y шах Q(9).3,a) (16.78) 


Leaving open (for a moment) how Q is represented, it is easy to be seen how the al- 
gorithm particle projection can be applied to compute a Monte Carlo approximation 
of the right hand-side expression: Given a belief state 0; and an action a;, parti- 
cle projection computes a sample of R(o¢+1) and 0,,1, from which the expected 
value on the right hand side of (16.78) can be approximated. 


It has been shown [3] that if both sides of (16.78) are equal, the greedy policy 


09(0) = argmaxQ(6,a) (16.79) 


is optimal, i.e., с* = c9. Furthermore, it has been shown (for the discrete case!) that 
repetitive application of (16.78) leads to an optimal value function and, thus, to the 
optimal policy [44, 8]. 


Our approach essentially performs model-based reinforcement learning in belief space 
using approximate sample-based representations. This makes it possible to apply a 
rich bag of tricks found in the literature on MDPs. In our experiments below, we use 
on-line reinforcement learning with counter-based exploration and experience replay 
[21] to determine the order in which belief states are updated. 


16.4.1.2 Nearest Neighbor 


We now return to the issue how to represent Q. Since we are operating in real-valued 
spaces, some sort of function approximation method is called for. However, recall 
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Figure 16.3 (a) The environment, schematically. (b) Average performance (reward) as 
a function of training episodes. The black graph corresponds to the smaller environment 
(25 steps min), the grey graph to the larger environment (50 steps min). (c) Same results, 
plotted as a function of number of backups (in thousands). 


that Q accepts a probability distribution (a sample set) as an input. This makes most 
existing function approximators (e.g., neural networks) inapplicable. 


In our current implementation, nearest neighbor [24] is applied to represent Q. More 
specifically, our algorithm maintains a set of sample sets 0 (belief states) annotated 
by an action a and a Q-value Q(0, a). When a new belief state 6’ is encountered, its 
Q-value is obtained by finding the k nearest neighbors in the database, and linearly 
averaging their Q-values. If there aren't sufficiently many neighbors (within a pre- 
specified maximum distance), 0' is added to the database; hence, the database grows 
over time. 


Our approach uses KL divergence (relative entropy) as a distance function!. Techni- 
cally, the KL-divergence between two continuous distributions is well-defined. When 
applied to sample sets, however, it cannot be computed. Hence, when evaluating the 
distance between two different sample sets, our approach maps them into continuous- 
valued densities using Gaussian kernels, and uses Monte Carlo sampling to approxi- 
mate the KL divergence between them. This algorithm is fairly generic an extension 
of nearest neighbors to function approximation in density space, where densities are 
represented by samples. Space limitations preclude us from providing further detail 
(see [24, 33]). 


16.4.2 Experimental Results 


Preliminary results have been obtained in a world shown in two domains, one synthetic 
and one using a simulator of a RWI B21 robot. 


1 Strictly speaking, KL divergence is not a distance metric, but this is ignored here. 
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Figure 16.4 Find and fetch task: (a) The mobile robot with gripper and camera, holding 
the target object (experiments are carried out in simulation!), (b) three successful runs (tra- 
jectory projected into 2D), and (c) success rate as a function of number of planning steps. 


In the synthetic environment (Figure 16.3a), the agents starts at the lower left corner. 
Its objective is to reach “heaven” which is either at the upper left corner or the lower 
right corner. The opposite location is “hell.” The agent does not know the location of 
heaven, but it can ask a “priest” who is located in the upper right corner. Thus, an op- 
timal solution requires the agent to go first to the priest, and then head to heaven. The 
state space contains a real-valued (coordinates of the agent) and discrete (location of 
heaven) component. Both are unobservable: In addition to not knowing the location of 
heaven, the agent also cannot sense its (real-valued) coordinates. 5% random motion 
noise is injected at each move. When an agent hits a boundary, it is penalized, but it is 
also told which boundary it hit (which makes it possible to infer its coordinates along 
one axis). However, notice that the initial coordinates of the agent are known. 


The optimal solution takes approximately 25 steps; thus, a successful POMDP planner 
must be capable of looking 25 steps ahead. We will use the term “successful policy” to 
refer to a policy that always leads to heaven, even if the path is suboptimal. For a policy 
to be successful, the agent must have learned to first move to the priest (information 
gathering), and then proceed to the right target location. 


Figures 16.3b&c show performance results, averaged over 13 experiments. The solid 
(black) curve in both diagrams plots the average cumulative reward J as a function 
of the number of training episodes (Figure 16.3b), and as a function of the number of 
backups (Figure 16.3c). A successful policy was consistently found after 17 episodes 
(or 6,150 backups), in all 13 experiments. In our current implementation, 6,150 back- 
ups require approximately 29 minutes on a Pentium PC. In some experiments, a suc- 
cessful policy was identified in 6 episodes (less than 1,500 backups or 7 minutes). 
After a successful policy is found, further learning gradually optimizes the path. To 
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investigate scaling, we doubled the size of the environment (quadrupling the size of 
the state space), making the optimal solution 50 steps long. The results are depicted 
by the gray curves in Figures 16.3b&c. Here a successful policy is consistently found 
after 33 episodes (10,250 backups, 58 minutes). In some runs, a successful policy is 
identified after only 14 episodes. 


We also applied MC-POMDPs to a robotic locate-and-retrieve task. Here a robot 
(Figure 16.42) is to find and grasp an object somewhere in its vicinity (at floor or table 
height). The robot's task is to grasp the object using its gripper. It is rewarded for 
successfully grasping the object, and penalized for unsuccessful grasps or for moving 
too far away from the object. The state space is continuous in x and y coordinates, 
and discrete in the object's height. 


The robot uses a mono-camera system for object detection; hence, viewing the object 
from a single location is insufficient for its 3D localization. Moreover, initially the 
object might not be in sight of the robot's camera, so that the robot must look around 
first. In our simulation, we assume 3046 general detection error (false-positive and 
false-negative), with additional Gaussian noise if the object is detected correctly. The 
robot's actions include turns (by a variable angle), translations (by a variable distance), 
and grasps (at one of two legal heights). Robot control is erroneous with a variance 
of 20% (in z-y-space) and 5% (in rotational space). Typical belief states range from 
uniformly distributed sample sets (initial belief) to samples narrowly focused on a 
specific z-y-z location. 


Figure 16.4c shows the rate of successful grasps as a function of iterations (actions). 
While initially, the robot fails to grasp the object, after approximately 4,000 iterations 
its performance surpasses 80%. Here the planning time is in the order of 2 hours. 
However, the robot fails to reach 10096. This is in part because certain initial con- 
figurations make it impossible to succeed (e.g., when the object is too close to the 
maximum allowed distance), in part because the robot occasionally misses the object 
by a few centimeters. Figure 16.4b depicts three successful example trajectories. In 
all three, the robot initially searches the object, then moves towards it and grasps it 
successfully. 


16.5 AUGMENTED MARKOV DECISION 
PROCESSES 


So far, we have studied two frameworks and algorithms for action selection under 
uncertainty: MDP and POMDP algorithms. Both frameworks accommodate non- 
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deterministic action outcomes. Only the POMDP algorithms can cope with uncer- 
tainty in perception, whereas MDP algorithms assume that the state is fully observ- 
able. On the other hand, MDP algorithms are polynomial in the number of states, 
whereas POMDP algorithms are doubly exponential. 


Both MDP and POMDP are in fact extreme ends of a spectrum of possible probabilis- 
tic algorithms. MDP algorithms ignore the issue of uncertainty in a robot's belief en- 
tirely. In contrast, POMDPs consider all possible belief functions, even though many 
of them might be extremely unlikely in practice. This raises the question of the middle 
ground: Are there probabilistic algorithms that can accommodate some of the robot's 
uncertainty, yet are computationally more efficient than the full POMDP solution? 


The answer is yes. In many robotics domains, the range of belief distributions the 
robot might encounter is a small subspace of the space of all belief distributions that 
can be defined over the state space. This insight enables us to devise probabilistic plan- 
ning algorithms that are significantly more efficient than solutions to the full POMDP 
problem, which still coping with robot beliefs in adequate ways. 


16.5.1 The Augmented State Space 


The augmented MDP algorithm (AMDP) applies to situations where the belief state 
can be summarized by a lower-dimensional statistic. Values and actions are calculated 
from this statistic, instead of the full belief b. The smaller the statistic, the more 
efficient the resulting algorithm. 


In many situations a good choice of the statistic is the tuple 


b = (argmaxb(s); H[b]) (16.80) 
where argmax, b(s) is the most likely state under the belief distribution b, and 
Alb) = - | b(s) In b(s) ds (16.81) 


is the entropy of the belief distribution. Calculating a value function parameterized on 
b, instead of b, is similar to the MDP approach assuming that the most likely state is 
the actual state of the world. It differs, however, by a one-dimensional statistic that 
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characterizes the amount of uncertainty in the state estimate. The entire uncertainty— 
usually an infinitely dimensional quantity—is thus summarized by a one-dimensional 
quantity: the entropy. The representation is mathematically justified if b is a sufficient 
statistic of b with regards to the estimation of value, that is: 


C(b) = C) (16.82) 


for all b the robot may encounter. In practice, this assumption will rarely hold true. 
However, the resulting value function might still be good enough for a sensible choice 
of action. Alternatively, one might consider different statistics, such as moments of 
the belief distribution (mean, variance, ...), modes, and so on. 


16.5.2 Value Iteration in AMDPs 


Value iteration can easily be applied to the augmented MDP model. Let f be the 
function that extracts the statistic from b, that is 


b = f(b) (16.83) 


for arbitrary beliefs b. Then the POMDP value iteration equation (16.76) can be rewrit- 
ten as 


C(f(b)) = max f [EBO f ols) B(b, a, o')(s') ds’ 
f Pes) Ке? s) b(s) ds ds' do' (16.84) 


This suggests an algorithm similar to the one in Table 16.2, replacing the value func- 
tion Ĉ by the concatenated function Ó f. This algorithm would still loop over all belief 
states b, but the estimation of the value function would be considerably faster due to 
the fact that many belief b share the same statistic f(b). 


Here we will formulate this algorithm similar to the algorithm MDP value iteration 
in Table 15.1. 
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1: Algorithm Augmented MDP value iteration(): 
2: for all b 
3: СФ) = 0 


А repeat until convergence 
5: for all b 


6: C(b) = max, flet) + ©(6)] PO Ja, 6); а 
7; return С 


Table 16.4 Тһе value iteration algorithm for augmented MDPs with fi nite state and action 
spaces. 


In analogy to the derivation in the Section on POMDPs, Section 16.3.1, we can calcu- 
late P(b'|a, b) as follows: 


P(W|a,) = J Pab P(b|b) db (16.85) 


J [o P(V/|o', a,b) P(o'|a, b) do db 
(ШЕЕ P(b'|o',a,b) P(o'|s") P(s'|a, b) ds’ do db 
(ШЕЕ P(V'|o', a,b) P(o'|s") P(s'|a, s) P(s|b) ds а do db 


= (ШЕЕ T¢(B(0',a,b))=6' P(o'|s’) P(s'|a, s) P(s|b) ds аз do db 


I 


The function 7 is the indicator function whose value is 1 iff its condition is true. The 
resulting conditional probability P(b'|a, b) is the motion model, formulated in the 
space defined by b. 


Similarly, the expected costs can be expressed as a function of b as follows: 
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= (ШЕ c(s); (в) ds db (16.86) 


The resulting value iteration algorithm for the augmented MDP is shown in Table 16.4. 
Notice that this algorithm is essentially the same as the one derived for MDPs. It is also 
highly related, though less obvious, to the general POMDP value iteration algorithm 
in Table 16.2. Lines 2 and 3 in Table 16.4 initialize the value function, defined over 
the space of all belief statistics b. Lines 4 through 6 calculate the value function, using 
the probability P(b’|a, b) defined above. 


The resulting algorithm still appears inefficient, due to the difficulty of computing 
P('|a, b). 


[reword the following] 


We have employ two strategies for speeding up the calculation. Both are approximate. 
First, we have replaced the exhaustive integration in Equations (16.85) and (16.86) 
by a stochastic integration, in which the belief b, the states s and s’ and the observa- 
tion o' are drawn at random from the corresponding distributions. Second, the model 
P(U'|a, b) was cached using a look-up table, avoiding computing the same probabil- 
ity twice. This leads to an explicit preprocessing phase where the familiar proba- 
bilistic models P(o|s) and P(s'|s, a) are ‘translated’ into the augmented state model 
P('|a, b), for which the subsequent application of value iteration is then straightfor- 
ward. 


16.5.3 Illustration 


The augmented state model considers the robot's uncertainty when executing actions. 
In particular, it characterizes the increase and decrease of certainty upon moving and 
sensing. This enables the robot to anticipate and avoid situations with increased danger 
of loosing critical state information. 


In the context of mobile robot navigation, the augmented MDP algorithm anticipates 
the increase and decrease of uncertainty in a robot's pose estimate. For example, a 
robot traversing a large featureless area is likely to gradually loose information as to 
where it is. This is reflected in the conditional probability P(b' |a, b), which with high 
likelihood increases the entropy of the belief in such areas. In areas populated with 
localization features, e.g., near the walls, the uncertainty is more likely to decrease. 
Planners using the augmented MDP can anticipate such situations, and generate poli- 
cies that minimize the time of arrival while simultaneously maximize the certainty at 
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Figure 16.5 Examples of robot paths in a large, open environment, for two different con- 
fi gurations (top row and bottom row). The diagrams (a) and (c) are paths generates by a 
conventional dynamic programming path planner that ignores the robot's perceptual un- 
certainty. The diagrams (b) and (d) are obtained using the augmented MDP planner, which 
anticipates uncertainty and avoids regions where the robot is more likely to get lost. In anal- 
ogy to ships who often stay close to the shore for orientation, the augmented MDP planner 
is known as ‘coastal planner.’ 


the time of arrival at a goal location. Since the uncertainty is an estimate of the true 
positioning error, it is a good measure for the chances of actually arriving at the desired 
location. 


In the context of mobile robot navigation, the augmented MDP algorithm is also 
known as coastal navigation. This name indicates the resemblance to ships, which 
often stay close to the coastline so as to not loose track of their location (unless, of 
course, the ship is equipped with a global navigation system). 
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Figure 16.6 Performance comparison of MDP planning and Augmented MDP planning. 
Shown here is the uncertainty (entropy) at the goal location as a function of the sensor range. 


Figure 16.5 shows example trajectories for two constellations (two different start and 
goal locations). The diagrams on the left correspond to a MDP planner, which does not 
consider the robot's uncertainty. The augmented MDP planner generates trajectories 
like the ones shown on the right. In Figures 16.5a&b, the robot is requested to move 
through a large open area in a museum (approximately 40 meters wide). he MDP 
algorithm, not aware of the increased risk of getting list in the open area, generates 
a policy that corresponds to the shortest path form the start to the goal location. The 
coastal planner, in contrast, generates a policy that stays close to the obstacles, where 
the robot has an increased chance of receiving informative sensor measurements at the 
expense of an increased travel time. Similarly, Figure 16.5c&d considers a situation 
where the goal location is close to the center of the featureless, open area. Here the 
coastal planner recognizes that passing by known objects reduces the pose uncertainty, 
increasing the chances of successfully arriving at the goal location. 


Figure 16.6 shows a performance comparison between the coastal navigation strategy 
and the MDP approach. In particular, it depicts the entropy of the robot's belief b 
at the goal location, as a function of the sensor characteristics. In this graph, the 
maximum perceptual range is varied, to study the effect of impoverished sensors. As 
the graph suggests, the coastal approach has significantly higher chances of success. 
The difference is largest if the sensors are very poor. For sensors that have a long 
range, the difference ultimately disappears. The latter does not come as a surprise, 
since with good range sensors the amount of information that can be perceived is less 
dependent on the specific pose of the robot. 
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16.6 SUMMARY 


In this section, we introduced three major algorithms for action selection under uncer- 
tainty. 


= Тһе Markov Decision Process (MDP) framework addresses the problem of action 
selection in situations where the outcomes of actions are stochastic, but the robot 
knows its state with absolute certainty. 


m Value iteration solves discrete MDPs by computing a value function over states. 
Selecting actions by greedily maximizing value leads to optimal action choices. 
A key characteristic of value functions is that they induce policies for action 
selection that are defined over the entire state space. No matter what the outcome 
of an action is, the robot knows what to do. 


m The more general framework of Partially Observable Markov Decision Processes 
(POMDPs) addresses the question of action selection with imperfect sensors. 
POMDP algorithms have to base decisions on belief states. Solutions trade off 
information gathering (exploration) and exploitation. 


m Value iteration can also be applied to solve POMDPs. In worlds with finitely 
many states, actions, and observations, the value function is piecewise linear in 
the parameters of the belief space. However, in the worst case calculating the 
value function is doubly exponential in the planning horizon. 


= Finally, we introduced the Augmented Markov Decision Process (AMDP) frame- 
work, which blends MDPs and POMDPs. AMDPs represent the robot's uncer- 
tainty by a low-dimensional statistic (e.g., the entropy), opening the door to effi- 
cient planning algorithms that consider some of the uncertainty in state estima- 
tion. 


m Value iteration was also applied to AMDPS. Within mobile robot navigation, the 
resulting algorithm is known as coastal navigation, since it navigates robots in а 
way that minimizes the anticipated uncertainty in localization. 


16.7 BIBLIOGRAPHICAL REMARKS 


16.8 PROJECTS 


1. ??? 
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